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INTRODUCTION 



There are many general-purpose microprocessors 
available yet none shows significant advantages when 
used in high-level language and special purpose appli- 
cations. A micro-programmed machine can be design- 
ed to take into account the general and specialized 
needs of a particular system. The resulting processor 
need not be much larger than a standard MOS-based 
micro-processor design; however, it will show signifi- 
cant improvement in performance. 

This application note is the result of the authors' efforts 
in creating a high performance, 16-bit WCS (Writable 
Control Store) computer for practical as well as experi- 
mental use. Given the modifiable Control Store, along 
with considerable parallelism, such a computer is a 
perfect vehicle for high level language or protocol 
execution, or modifiable controller applications. The 
WCS shortens the development time needed to adapt 
the micro-programmed machine to different 
applications. 

This project is not the result of any development 
activity at either AMD or MITEL, and neither company 
can be held responsible for the accuracy of this text or 
the design. The authors have tried to be as accurate 
as possible, and will update the text as discrepancies 
are noted. The information in this text is public 
property. 

The Computer Itself 

The main computer (sans I/O) resides on a board about 
9 by 10 inches. An Am29116 is used as the 
processor. The computer has bulk RAM, a cache 
memory, a parallel multiplier, a pre-fetch buffer for 
instructions with its own program counter, a separate 
bus for I/O, and various registers and multiplexers to 
accommodate pipelined execution. Heavy use of VLSI 
and PAL devices allows it to compress all the afore 
mentioned, and a 4k x 32 WCS, onto one board. This 
compares favorably with standard MOS micro- 
processor designs. 

Objectives 

The name of this board, MIP, stands for Microcoded I- 
code Processor; it also alludes to the aim of one million 
high-level instructions per second of execution. The 
MIP board could be considered a 'working standard' 
microcoded instruction processor when comparing to 
novel architectures. The invitation is there to compare 
with board level computers done with MOS processors 
of conventional design. 



Another objective is to create the ultimate personal 
workstation. UCSD Pascal has been ported so far, and 
it works well. Modula 2 for the next version of 
microcode is currently under consideration. 

The Design 

The design traces its ancestry back to similar 
processors built in the late seventies and early 
eighties, by several large companies. Some of these 
processors were used as minicomputers, and some as 
dedicated processors inside large machines such as 
telephone switches. The main advantage in using 
such a processor is that, given the flexible instruction 
set, performance can be optimized for a given 
application. 

The Text 

This application note should provide enough 
information for a design team to reproduce the MIP 
processor. In addition, some guidelines as to what 
tools to assemble, and what kind of effort should be 
required to complete the project are also included. 

This application note is divided into three chapters. 
The first chapter covers hardware descriptions. This is 
divided into subsections describing the major 
functional blocks of the processor in detail. Units in 
block diagrams having numbers preceded by the 
letters A, B, C refer to IC's in the detailed schematic 
diagram. The first chapter should be used as a 
hardware reference manual forthe MIP board. 

The second chapter covers the software descriptions. 
This is divided into subsections describing both the 
low-level microcode software and the system-level 
software. The Apple Pascal Reference Manual should 
be referred to by readers new to this material. This 
chapter should be used as a software reference 
manual forthe MIP board. 

The third chapter covers performances. This covers 
the means used to measure performance, bench- 
marking, and how to tailor the software via intrinsic 
functfonsto enhance performances. 

Chapter 3 concludes with a review of the design and a 
discussion of promising areas for future work. 
Appendix B provides brief descriptions of some of the 
AMD 29XXX family parts used in this application note. 
Detailed technical questions can be directed to the 
AMD Applications Department at (408) 982-6266. 



Chapter 1 
HARDWARE DESCRIPTION 



1.0 OVERVIEW 

To best describe the processor, the entire design is 
divided into a number of functional blocl<s which more 
or less operate autonomously. The functional blocks 
are treated separately and their inter-relationships are 
shown. 

Figure 1.1 shows the major functional units of the MIP 
board as they are connected together by two main 
internal busses. The processor memory, including the 
cache, is accessed via the BlU; I/O access is provided 
by the I/O unit, which is essentially another BlU. The 
major functional units mentioned are the only units with 
direct connection to the YBUS. 

Abbreviations 



mapped otherwise, and which fulfill control functions. 

In the remainder of this overview section, these units 
and busses are summarized. 

The MSD Bus 

The MSD Bus (Microstore Data Bus) is output from the 
ecu (Computer Control Unit). The MSD Bus controls 
the four major functional blocks on board: the APU, 
BlU, I/O UNIT, and the CCU itself. In turn, within the 
ecu, the MSD Bus is interpreted and various strobes 
and signals are sent out to control registers and 
buffers on board. 

The YBUS 

The YBUS is controlled by the CCU and provides a 
high-speed data path between the APU (Arithmetic 
Processor Unit), and other parts of the system. Data 
on the YBUS is gated to the other busses on board, in 
accordance with MSD directives and in synchronization 
with these other busses. 



The abbreviations used in this application note are as 
follows: 



APU 


= 


Arithmetic Processor Unit 


CCU 


= 


Computer Control Unit 


BlU 


= 


Bus Interface Unit 


MSD 


= 


Microstore Data Bus 


YBUS 


= 


Processor Data Bus 


DBUS 


= 


Data Bus 


ABUS 


= 


Address Bus 



The CCU 

The CCU (Computer Control Unit) is the source of the 
MSD Bus which provides instructions to the processor 
and the rest of the system. The CCU also handles all of 
the timings at the micro-instruction level, and produces 
the S/D Bus signals to synchronize and control gating 
onto all of the busses in the system. 

The APU 



The ABUS runs between the BlU and the memory. 
There is another bus, named the S/D Bus (for 
Source/Destination), which is really just a collection of 
all the strobes and signals on board that did not get 



The APU (Arithmetic Processor Unit) provides all of the 
arithmetic and logical functions of the processor. 
Special data-shift operations and multiply functions are 
also handled here. 
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Figure 1.1 MIP Block Diagram 



The BlU 

The BlU (Bus Interface Unit) contains address and data 
registers to interface the MIP to the main memory. 
Some of these registers are general purpose, and 
some are used only for high-level language execution. 

The I/O Unit 

The I/O unit is essentially another BlU. It contains a bi- 
directional data register to interface to a high-speed I/O 
channel. 



1.1 DETAILED DESCRIPTIONS 

The Arithmetic Processing Unit (APU) 

An APU is the part of a ojmputer which performs 
arithmetic and logical operations on data. There are 
usually a number of general purpose registers within 
the APU which may be used for temporary storage of 
variables. 

When dealing with high-level language concepts, a 
number of special purpose functions are often 
required; such as data shift and field isolation, bit oper- 
ations, prioritize operations, and multiply and divide. 
These functions are also contained within the APU. 

In the MIP computer, the APU consists of a Am29116, 
a Am29517, a Condition Code PAL device (CCPAL) 
and diagnostic registers for the YBUS. The Am291 1 6 
provides the bulk of the arithmetic functionality and the 
register file. All data shifting and field isolation 



capability are contained within the Am29116 
instruction set, except for dynamic shifts, which are 
augmented by overlaying a field in the CCU. Certain 
bit-oriented and rotate instnjctions normally receive a 
count from the CCU (via the MSD Bus). These 
instructions may be modified by the CCU so that the 
count is dependent on the data on the YBUS. 

All multiplications are done by the Am29517. Several 
formats are available with this device to suit different 
numerical algorithms. Divides are done using the 
Am29116 in a two-cycle-per-bit divide loop. To do 
faster divides, the Newton-Raphson method could be 
used. 

The MSD Bus supplies instructions for the Am29116 
and Condition Code PAL device circuit. The Source/ 
Destination Control Bus supplies decoded control 
signals (i.e. strobes, clocks, etc.) to perform major 
sequencing. 

The APU circuit is shown in detail in Appendix C. The 
instruction code going into the Am29116, Ig to I-] 2, is 
from MSD bits 32-35. B20 creates these bits from 
MSD bits 9 to 12, or YBUS bits to 3. This allows the 
YBUS to specify the count, or bit number, for certain 
operations in the Am29116. This is the family of 
dynamic bit-shift, rotate and field isolation instmctions. 

The Am29818 diagnostic pipeline registers are used 
to read o r writ e to the YBUS when testing. During 
testing, if OEY is held High, any value can be placed 
on the YBUS to load any register. The Am29818 
YBUS registers have also proved useful in some micro- 
code sequences. 
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Figure 1.2 APU Block Diagram 



The CCPAL accomplishes latching and decoding 
functions for the condition code (CC). This is used by 
the ecu for conditional branches. Condition codes 
from the Am29116 are latched extemally to give 
pipelined execution and improve the cycle time of the 
processor. 

MSD bits 12-15 determine the polarity and selection 
criteria for the condition code (CC) from the status bits. 



fields into timing strobes which form collectively the 
S/D Bus. 

The Exception Control Unit (ECU) handles irregular 
control transfers (i.e. interrupts). In some cases the 
ECU may be simplified, or absent if the application 
does not require it. 

Microstore Control Section 



T1 thru T4, and CT from the Am29116 are latched 
when PLCLK goes from Lo w to High at the start of 
each cycle, if SRE is Low. If SRE is High, the latched 
codes are retained (Figure 1.3). T1 thru T4 are the 
standard arithmetic flags: Zero, Sign, Carry and 
Overflow. CT is generated by a class of instructions in 
the Am29116 and is simply a delayed function of T1 
thru T4. PLCLK has a 50% duty-cycle, except in the 
case of Wait States which extend the Low phase by 
integral cycles (62.5 ns each for an 8 MHz PLCLK). 

The CCPAL equations show how the latching and 
decoding are done (Figure 1.4). The APU, as 
constmcted, combines the Am29116, the Am29517, 
and Enhanced Condition Code handling. 

The Computer Control Unit (CCU) 

The CCU controls and synchronizes the various units 
of the computer. It provides an ordered set of instruc- 
tions to the rest of the machine. The flow of these 
instructions may depend on the data. 

An overview of the CCU is shown in Figure 1.5. It 
consists of 3 main sections. These sections are: the 
Microstore Control Section, the Exception (Internjpt) 
Control Section, and the Cache and Register Control 
Section. 

The Microstore Control Section is the source of the 
microcode instruction for all parts of the machine. The 
Cache and Register Control Section 'cracks' microcode 



Instruction execution in the processor is controlled by 
a single clock, PLCLK. A micro- instruction cycle is 
defined by a period of PLCLK. This clock is a 50% 
duty-cycle signal with a nominal period of 125 ns The 
Low period of this clock may be extended by 62.5 ns 
increments to accommodate timing conflicts in the BlU, 
or may be held Low by the external HALT line. 

The Microstore Control Section is shown in Figure 1.6. 
The Micro-Sequencer (Am2910A) creates a 
Microstore Address which accesses the Control Store. 
The Control Store data is latched at the beginning of 
each pipeline clock cycle and forms the MSD Bus. 
Most of the MSD bits go to other parts of the machine, 
but 4 bits (28-31) are used to control the Am2910A 
sequencer itself. 

A portion of the MSD data may be used to form a 
branch address by gating the lower 16 bits onto the 
YBUS. This machine uses a compressed Micro Store 
Word (vertically coded). The micro store branch 
address and condition code fields are overlapped with 
the Am29116 control field. This means that a given 
instruction may cause conditional branching to occur 
or may be used for an APU operation. 

The WCS Section, shown in Figure 1.7, has eight 4K 
X 4 static RAMs (AMD9968 CMOS static RAMs) which 
are used for microcode data. Four Am29818 
diagnostic pipeline registers are used for single-level 
pipelining of the microstore (RAM) data. Two 
Am29818 registers (Am29818 address access) can 
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Figure 1.3 Condition Code PAL Device Circuit 



PAL16L8 






LTl = /T1*SRE*/PLCLK 


; Condition Code Pal 






+ LT1*/SRE 


; ccl. pal. text 






+ LT1*PLCLK 


P0L,C,B,A,CT,T1,T2,T3, 


/SRE,GND,PLCLK, 




+ LT1*/T1. 


/ceo, /CTL, T4 , /LT4 , /LT3 


,/LT2,/LTl,/CCl,VCC 




; T2 latch 


CGO =/C*/B*/A* CTL 


; active low 




; 


conditions 






LT2 = /T2*SRE*/PLCLK 


+/C*/B* A* LTl 


; nct,nz,nc,p,novr 






+/C* B*/A* LT2 






+ LT2*/SRE 


+/C* B* A* LT3 








+ C*/B*/A* LT4 






+ LT2*PLCLK 


@POL. 






+ LT2*/T2. 


CCl =/C*/B*/A*/CTL 


; active high 
conditions 




; T3 latch 


+/C*/B* A*/LT1 


; ct,z,c,n,ovr 




LT3 = /T3*SRE*/PLCLK 


+/C* B*/A*/LT2 








+/C* B* A*/LT3 






+ LT3*/SRE 


+ C*/B*/A*/LT4 








+ C* B* A 


; unc 




+ LT3*PLCLK 


@/POL. 






+ LT3*/T3. 


; CT latch 






; T4 latch 


CTL = /CT*SRE*/PLCLK 


; sample if status 
during last part 




LT4 = /T4*SRE*/PLCLK 


+ CTL*/SRE 


; keep if no status 
update 




+ LT4*/SRE 


+ CTL*PLCLK 


; hold while plclk 




+ LT4*PLCLK 


+ CTL*/CT. 






+ LT4*/T4. 


; Tl latch 






END 
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; sample if status 
during last part 

keep if no status 
update 

hold while plclk 



; sample if status 
during last part 

keep if no status 
update 

hold while plclk 



; sample if status 
during last part 
keep if no status 
update 
hold while plclk 



; sample if status 
during last part 

keep if no status 
update 

hold while plclk 
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Figure 1.6 Microstore Control Section 
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Figure 1.7 Writable Control Store (WCS) 
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Figure 1.8 Exception Control Section 
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Figure 1.9 Caclie and Bus Control Section 



jam the Microstore Address Bus and thus allow the 
Microstore Memory to be stored from the pipeline 
registers. The address access registers also help, 
during diagnostics, by providing a convenient way to 
look at the address from the Am291 OA. 

Exception Control Section 

The Exception Control Section (Figure 1.8) latches 
and regulates interrupt requests based on instructions 
from the MSD bus. 

The Exception Control Sectio n use s the Am2 914 
Interrupt Controller. Interrupts IRQO thru IRQ7 are 
prioritized and encoded to a 3-bit value. IRQ is 
generated and when the processor reads the 
Am2914, the 3 Low bits on the YBUS give the 
interrupt number. 

The Am29LS18 outputs (I0-I3) are shared between 
the Am2914 and the Am29517. The interaipt enable 
signal, lEN, is used to separately qualify instructions to 
the Am2914 (Iq-Is)- When the Am29517 is being 
used, lEN is held High. 

Cache And Bus Control Section 

The Cache and Bus Control Section (Figure 1.9) 
produces almost all of the signals for the S/D Bus. For 
clarity, this section is divided into three parts: the 
Diagnostic Connector, the YBUS Source/Destination 
Decode Block, and the Special Control Btock. 

The Diagnostic Connector provides access to the 
diagnostic pipeline registers and the timing logic of the 
processor. It is used to examine processor status, load 
the WCS, or change some data value in the processor 
or its bulk memory. With this facility, it is possible to 
debug hardware, micro-code or high-level software. 
The signals on this interface are not time critical so the 
diagnostic device can be kept simple and need not 
have super performance to be useful. 

The YBUS Source/Destination Decode Block decodes 
two fields of the MSD Bus (16-19, 20-22) into the 
respective register and buffer control signals for 
devices connected to the YBUS. 

The Special Control Block integrates a number of 
functions which are necessary for the BlU. Due to the 
critical timing nature of some of the signals and the 
overall control relationship to the processor, they are 
included in the CCU. This also leads to a ojmpression 
in the amount of circuit real estate devoted to these 
functbns. 

The YBUS is the main data highway of the processor. 
All of the functional blocks attach to the YBUS. A set of 



fields in the micro-store word control activity on this 
bus. One data value can be moved from source to 
destination per pipeline cycle. The diagnostic regis- 
ters (Am29818's) attached to the YBUS uncondi- 
tionally capture the data at the end of each cycle. 

YBUS Source and Destination Control 
Section 

The YBUS Source and Destination Decode Block 
produces the Source and Destination control signals 
(S/D Bus) which are used to control register access to 
or from the YBUS (Figure 1.10). The signals are 
defined as follows: 



Signal 



Meaning 



OEY Enable output on Am291 16 

DREN Enable incoming data register (BlU) 

YREG Enable output of Am2981 8 registers on 

YBUS 

l ODEN Enable I/O unit incoming data 

IREN Enable instruction register (This signal is 

further qualified in regcti PAL.) 

OEPM Enable high product from Am2951 7 to 

YBUS 

OEPL Enable low product from Am2951 7 to YBUS 

LP IOC Move data into I/O control register 

LDDR Move data into data output register (BlU) 

LDPC Move data into program counter reg. (BlU) 

L DAR Move data into address register (BlU) 

OLE MovedatatoAm29116 (This signal is 

changed to DH by STEP PAL device in 
Diagnostics Section before feeding into 

theAm29116.) 

LDPG Move data to high address reg. (BlU) 

L DPCPG Move data to high PC-address reg. (BlU) 

LDIOA Move data to I/O address reg. 

CVECT Enable MSD bus to YBUS (Bits 1 5-0) 

enable branching 

ENY Enable Y register load on Am2951 7 

ENX Enable X register load on Am2951 7 

JEN Enable instruction for Am2914 

ABEN Enable readback on address bus (BlU) 

PGEN Enable readback on high address bus (BlU) 

YSHIFT Cause YBUS 3-0 to overlay MSD BUS 
35-32, which are normally MSD BUS 
12-9 



Diagnostic Section 

The Diagnostic Section (Figure 1.11) consists of the 
Diagnostic Connector and part of 'STEP' PAL device 
(B24). The Diagnostic Connector pin numbers are as 
shown in Figure 1 .1 1 . The 'STEP' PAL device, as the 
name suggests, is used to synchronize the incoming 
signals from the Diagnostic Connector and cause the 
XWAIT signal to be well-behaved with respect to the 
master ctock of the processor. 
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Figure 1.10 YBUS Decode Section 
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Figure 1.11 Diagnostics Section 




PAL16R4 


XWAIT := HALT*/ST1*/ST2 
+ HALT* STl 


; normal halt 


; 


+ HALT*ST1*ST2. 


; do step when 


; MIP Processor Cache enables and step 




/stl*st2 


control 






; step. pal .text 


; Cache chip selects 




CP16M, /PLCLK, /OLE, /RESET, /HALT, /STEP, AO, /BYTE, 


COL = /A16*BYTE 

+ /A16*A0*/BYTE. 




A16, GND, /EN, /CIL, /CIH, /XWAIT, /ST2, /STl, /DH, 


; 




/COH, /COL,VCC 


COH = /A16*/A0*/BYTE 
+ /A16*BYTE. 




DH := DLE*PLCLK*/RESET ; hold data if 


; 




die 


CIL = A16*BYTE 




+ DH*/PLCLK*/RESET. ; no change 


+ A16*A0*/BYTE. 




while plclk high 


CIH = A16*/A0*/BYTE 




STl := STEP*/RESET. ; detect the 


+ A16*BYTE. 




step pulse 


END 




ST2 := ST1*/RESET. ; stage 2 




Figure 1.12 Step PAL Equations 





The meanings of the mnemonics in the Diagnostic 
Connector are given below: 



Signal Name 


Utility 


RCS 


Read Control Store (Normally Asserted) 
causes MSD RAMs to assert data 


DCLK 


Data Clock (Normally Low) clocks data 
into Am2981 8 diagnostic register 


CSW 


Control Store Write (Normally High) 
strobes data into MSD RAMs 


RST 


General Reset signal (Normally High) 


MODE 


Am2981 8 control signal. 


SDI, SDO 


Serial Data for Am2981 8 devices 


HALT 


Processor Halt signal 


.Am2910OE 


Output enable for Am291 OA (Normally 
Asserted) 


STEP 


Single Step control signal. 



The PAL equations for 'STEP' PAL device are shown 
in Figure 1.12. The cache decode signals share this 
PAL device with the Diagnostics Section. Since the 
cache has four 2k x 8 RAMs, 4 cache enables are 
required to provide for byte access. The signals have 
the following functions: 



DH A signal for the input data latch of the 

Am2911 6, follows DLE sampled by PLCLK 
(micro-program uses this to control input 
latch) 

ST1 , ST2 Registers used to detect edge of STEP for 
single step of the processor 

XWAIT The control signal derived from HALT and 
ST1 and ST2 which is used to stop the 
processor from an external device 



COL.COH 
C1L,C1H 



Individual chip enables for the cache RAMs 



Special Control Block 

The Special Control Block consists of four 
distinguishable sections (Figure 1.13). There is a 
Cache Byte Decode Section to aid in decoding cache 
accesses, a Register Control Section, a Bus Seq- 
uence Control Section, and a MSD multiplexer (MUX). 



The Cache and Byte Decode Section is shown in 
Figure 1 .14. Part of the 'STEP' PAL device is used to 
demultiplex the address and byte-op infomation 
given. It is necessary to decode to the byte-level for 
the cache because the processor can do byte reads 
and writes. A16 is part of the Address Bus (A16 is '1' 
for CI L and C1H, '0' for COL and COH). The 4 individ- 
ual byte-wide cache outputs are controlled from here. 

Part of 'SHIFT' PAL device is used to control byte 
overlay for accesses over the DBUS. During memory 
byte operations, the byte being read or written is right 
justified in the data register of the BlU. 

The Shift PAL equations are shown in Figure 1.1 7A, 
since most of the PAL device is used in the 'SHIFT' 
funct'ion. 

The Register Control PAL device controls the 
instruction fetch, and takes care of inten-upt vectoring 
and cycle stretching for store conflicts. The outputs 
have the following interpretations: 







IRLEN, IRHEN 


Byte wide enables for the instruction 
register 






PCINC 


Program counter increment if no interrupt 


IRQV 


Enable interrupt vector to YBUS if 
interrupt accepted 


COE 


Enable cache if read and hit 






STREQ 


Store request for cycle in progress 
means BlU about to be busy. If BlU is 
already busy, there will be a wait 



The Bus Sequence Control Section controls accesses 
over the data and address bus to the cache and main 
memory (Figure 1.16). The BSEQ PAL device 
provides PLCLK, detects Data Acknowledge, 
requests an external memory cycle if the cache 
misses, and latches and holds the memory operation 
(MSD 27-25) until the cycle is completed. The micro- 
store instruction will specify that a certain type of bus 
cycle be performed. This instruction is loaded into the 
Bus Sequencer and the cycle is started. The micro- 
program sequencer can continue onto other 
instructions while this cycle is in progress. If another 
bus cycle instruction (other than NOP) arrives while the 
Bus Sequencer is busy, the Wait line will be asserted 
until the current bus cycle completes. 

There are also other conditions which can cause this 
Wait to occur. If the micro-instruction specifies a data 
or instruction register, while a bus cycle is in progress 
to fill that register, a Wait will occur until the register is 
filled. 
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PAL16L8 

MIP Processor Register Control Pal 
regctl. pal. text 

R0,R1,R2,/IRQ,PC0,/DREN,/HIT,/IREN,/LDPC,GND, 

YBUSO, /IRLEN, /IRHEN, /PCINC, /IRQV, /DRLD, /IRLD, /COE, /STREQ, VCC 

Instruction register enables if no vector interrupt 
IRLEN = /PCO*IREN*/IRQ. ; low byte if no int 
IRHEN = PCO*IREN*/IRQ. ; high byte if no int 

Program counter increment if no vector interrupt 



PCINC = IREN*/IRQ 

+ LDPC*YBUSO*/PCO 
+ LDPC*/YBUSO*PCO. 



; inc pc if no vector interupt 
; loadpc s new <> pcO then inc 



interrupt vector bit generation 

IRQV = IREN*IRQ 
SIREN* IRQ. 



decode vector interrupt condition 
enable vectoring to ybus 



; Cache output enable 

COE = IRLD*HIT 
+ DRLD*HIT. 



; cache enable if read and hit 



Store request for processor cycle in progress 

STREQ = R2 ; 4 - 7 data store ops and waits 

+ /R2 * RO ; 1 & 3 

+ /R2* R1*/R0*PCINC*PC0. ; 2 auto fetch program store 

END 



Figure 1.15A REGCTL PAL 
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Figure 1.16 Bus Sequence Control Section 



The types of bus cycles which may be performed are 
Read and Write Word, using AR (address register) and 
DR (Data Register), Read and Write Byte, using AR and 
the low-order half of DR, Read Word, using PC 
(Program Counter) and IR (Instruction Register), and 
Conditional Read Word, using PC and IR. This last 
cycle is used when fetching op-codes for high-level 
language execution. If the PC is odd and the IR is 
being accessed, a bus cycle will start to automatically fill 
the IR, using the value of the PC after it has been 
incremented by one. 

A NOP instmction is also available for the Bus 
Sequencer. This causes no action to be done to 
memory, nor willit cause a Walt. There is a WIO (Wait on 
I/O) op-code for the Bus Sequencer control logic. It 
also causes no memory activity but does cause a 
Processor Wait if the memory is being used. The Bus 
Control Sequencer signals have the following 
functions: 



PLCLK 


Is the master instruction clock for the 
processor 






DTAK 


The synchronized acknowledge from a 
memory device or cache 


XRQ 


Active during a memory request when there 
is no HIT on the cache (always active 
during write cycles) 


R0,R1,R2 


The latched memory store operation in 
progress 






ACTIVE 


Indicates a memory cycle in progress 



PCO 



A copy of the LSB of the program counter 
used to point to the proper IR byte and to 
determine when to do the next program 
store fetch. 



The BUSCTL PAL device is controlled by the BSEQ 
PAL dev ice. The BUSCTL PAL device creates 
BYTEO P and decodes signals DRLD, DROE, PCEN, 
AREN, and the Wait State criteria. The four decoded 
signals control the data flow on the main memory data 
bus,'DBUS', and determine which address register is 
to supply the address for the bus cycle. 

This PAL device also does the cache write enable if 
one is allowed by the UPDATE signal (UPDT) from the 
cache logic. Figure 1 .16A gives the equations for the 
MIP Processor Bus Control PAL device. For the 
BUSCTL PAL device (Figure 1.1 6B) the signals have 
the following functions: 



IRLD 


Loads the Instruction Register (IR) when 
data is available from a Program Store Read 
cycle 






DRLD 


Loads the Data Register (DR) when data is 
available from a Data Store Read 






BYTEOP 


Signifies that the current cycle is a byte 
operation 


OWE 


Is the write enable for the Cache Data and 
Tag Memory, it occurs when valid data is 
written into the cache. 






DROE 


Is the enable for data register write to 
memory 






PCEN 


Is active when the Program Counter 
provides the memory address 


AREN 


Is active when the Address Register 
provides the memory address 



WAIT 



Causes the processorto wait when a 
conflict occurs within the BID 
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PAL16R8 

Mip Processor Bus Control Sequencer Pal 
bseq.pal .text 

CP16M,M25,M25,M27, /WAIT, /DACK, /HIT, /PCINC, /RESET, GND, 
/EN, /PLCLK, PCO , /DTK, /ACTIV, /XRQ, R2 , Rl , RO , VCC 

main pipe clock for the processor 1 cycle = 1 instruction 

PLCLK := /RESET*/PLCLK ; going low 

+ WAIT*/RESET. ; wait request 

data acknowledge from either external memory or the cache 
only generate dtk for state 4 & 7 when not in external wait 

DTK := XRQ*DACK*ACTIV*/RESET ; dack detect normal cycles 

+ /R2*/R1*R0*ACTIV*HIT*PLCLK*/RESET ; 1 dtk for cache hit 

+ /R2*R1* ACTIV*HIT*PLCLK*/RESET ; 2 & 3 dtk for cache hit 

+ R2*/R1*R0*ACTIV*HIT*PLCLK*/RESET ; 5 dtk for cache hit 

external memory request if data not in cache or a write cycle 

XRQ := /R2*/R1* RO*/XRQ*/DTK*/RESET*ACTIV*/HIT*PLCLK ; 1 

+ /R2* Rl* /XRQ*/DTK*/RESET*ACTIV*/HIT*PLCLK ; 2 & 3 

+ R2*/R1* RO*/XRQ*/DTK*/RESET*ACTIV*/HIT*PLCLK ; 5 

+ R2* /RO*/XRQ*/DTK*/RESET*ACTIV*PLCLK ; 4 & 6 write thru cache 

+ XRQ*/DTK*/RESET. 

store request codes rO - r2 are transparent until an active cycle 
then the latch the code for that cycle until it is completed 

/RO :=/M25*/ACTIV*/RESET ; transparent while not active 
+ /RO*ACTIV*/RESET. ; then latch last state 

/Rl :=/M26*/ACTIV*/RESET ; transparent while not active 
+ /R1*ACTIV*/RESET. ; then latch last state 

/R2 :=/M27*/ACTIV*/RESET ; transparent while not active 
+ /R2*ACTIV*/RESET. ; then latch last state 

signifies that a an active store request is in progress 
until an acknowledge of some sort is generated 
may not go active until wait goes away 

ACTIV := ACTIV*/DTK*/RESET ; staying active 

+ M27*/M26 */ACTIV*/DTK*/RESET*PLCLK*/WAIT ; 4 & 5 
+ M27*M26*/M25*/ACTIV*/DTK*/RESET*PLCLK*/WAIT ; 6 data write 
+/M27 *M25*/ACTIV*/DTK*/RESET*PLCLK*/WAIT ; 1 & 3 
+/M27*M26*/M25*PCINC*PC0*/ACTIV*/DTK*/RESET*PLCLK*/WAIT. 
2 if peine 

a copy of the PC LSB for instruction register uses 
allows overlapped AR activity and IR use 

/PCO := /PCINC*/PCO*/RESET ; keep what got 

+ /PCO*WAIT*/RESET ; hold while waiting 

+ PCINC*/PLCLK*/PCO*/RESET ; hold while PLCLK high 

+ PLCLK*PCINC*PCO*/WAIT*/RESET. ; active if no waits 

END 

Figure 1.1 6 A BSEQ PAL Device 
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PAL16L8 

Mip Processor Bus Control Pal 
busctl. pal. text 

RO, Rl, R2, /STREQ, /IREN, /DREN, PLCLK, /UPDT, /ACTIV, GND, 

/DTAK, /IRLD, /BYTEOP, /DRLD, /CWE, /DROE, /PCEN, /AREN, /WAIT, VCC 

Store control loads instruction register 

IRLD =/R2*/Rl* RO* ACTIV* /DTAK ; 1 RPS 

+/R2* R1*/R0*ACTIV*/DTAK. ; 2 CRPS 

Store control loads data register 

DRLD = /R2* Rl RO*ACTIV*/DTAK ; 3 RDSB 

+ R2*/R1* RO*ACTIV*/DTAK. ; 5 RDS 

Byte wide data op 

BYTEOP = /R2* Rl * RO * ACTIV ; 3 RDSB 

+ R2*/R1 */R0 * ACTIV. ; 4 WDSB 

Cache write control on current store control cycle 

CWE =/R2*/Rl* RO*ACTIV*UPDT*DTAK ; 1 RPS 

+/R2* R1*/R0*ACTIV*UPDT*DTAK ; 2 CRPS 

+ R2*/R1*/R0*ACTIV*UPDT*DTAK ; 4 WDSB 

+ R2*/R1* RO*ACTIV*UPDT*DTAK ; 5 RDS 

+ R2* R1*/R0*ACTIV*UPDT*DTAK. ; 6 WDS 

Store control enables data out register (also called WE) 

DROE = R2 * /Rl * /RO * ACTIV ; 4 WDSB 

+ R2* R1*/R0*ACTIV. ; 6 WDS 

Store control using PC to supply address 

PCEN =/R2*/Rl* RO ; 1 RPS 

+/R2* R1*/R0 ; 2 CRPS 

+/R2*/R1*/R0. ; to read back PC 

Store control using AR to supply address 

AREN = /R2* Rl* RO ; 3 RDSB 

+ R2. ; 4 - 7 

New store request conflicts with cycle in progress 



WAIT =ACTIV*STREQ*/PLCLK 

+/R2*/R1* RO*ACTIV*/PLCLK*IREN 
+ R2* R1*/R0*ACTIV*/PLCLK*IREN 
+/R2* Rl* RO*ACTIV*/PLCLK*DREN 
+ R2*/R1* RO*ACTIV*/PLCLK*DREN 
+ DTAK*STREQ*/PLCLK. 



store active and pending request 

1 RPS not done before IREN 

2 CRPS not done before IREN 

3 RDSB not done before DREN 
5 RDS not done before DREN 

store active and pending request 



END 

Figure 1.1 6B BUSCTL PAL Device 
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The MSD MUX (Figure 1.17) provides a method 
whereby the rotation, or bit number can be passed from 
the YBUS back into the micro-instnjction. When 
YSHIFT is asserted, the YBUS data overlays regular 
MSD data bits. This is caused by a particular YBUS 
store code in the MSD control word. The equations for 
'SHIFT' PAL device are shown in Figure 1 .1 7A. 

The SHIFT PAL signals have the following functions: 



Ig, U Q The multiplexed instruction to the 
Il1,l'l2 Am29116. 

DROEH Output enable for the upper byte of the data 
register for word writes to memory 



BYTEN 



Enable for the transceiver connecting the 
upper and lower bytes of the data bus 
used to properly justify byte data being 
read/written to memory. 



YSHIFT (FROM S/D BUS) 



MSDi 



YBUS 3^ 



■N 



■N 



-,/ 



PART 

OF 
SHIFT 

PAL 



B26 



MSD 35_32 

(TOAm29116) 



Figure 1.17 MSD Mux 



PAL16L8 

; MIP Processor Shift Control Pal 
; shift. pal. text 

MSD9,MSD10,MSD11,MSD12,Y0,Y1,Y2,Y3, /YSHIFT, GND, 
/DROE, 19, 110, 111, I12,A0, /BYTE, /DROEH, /BYTEN, VCC 



/I9 = /MSD9*/YSHIFT 
+ /YO*YSHIFT. 

/IIO = /MSD10*/YSHIFT 
+ /Y1*YSHIFT. 

/Ill = /MSD11*/YSHIFT 
+ /Y2*YSHIFT. 

/I12 = /MSD12*/YSHIFT 
+ /Y3*YSHIFT. 

DROEH = DROE*/BYTE. 

BYTEN = BYTE*/AO. 

END 



; shift value from microcode 
; shift value from ybus 

; shift value from microcode 
; shift value from ybus 

; shift value from microcode 
; shift value from ybus 

shift value from microcode 
shift value from ybus 

use upper byte if not byte write 

enable mux to upper byte 



Figure 1.1 7A SHIFT PAL Device 
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The Bus Interface Unit 

The Bus Interface Unit, or BlU, consists of tlie registers 
and transceivers necessary to interface to tfie main 
memory and tlie cacfie (Figure 1 .18). Tfie Data Section 
(Figure 1.19) consists of two 16-bit bidirectional data 
registers, and a 16-bit instruction register, accessible 
one byte at a time (op-codes are one byte each). There 
is also a byte MUX to allow reading the high byte, or 
writing to the high byte (for data accesses) in a byte 
addressed fashion, with the resulting data being right- 
justified in the data register. 

The Address Section (Figure 1.20) consists of a 24-bit 
address register and a 20-bit program counter. The ad- 
dress register is used with the data register for memory 
data transfers. The program counter is used with the 
instaiction register to implement a pre-fetched program 
store. When data bytes from the instruction register are 
read, the Program Counter is auto-Incremented. 

Both the Address and Program counter may be read 
back to the YBUS by the CPU via readback buffers. 

The Memory Section 

The Memory Section (Figure 1.21) consists of a Cache 
Section (Figure 1.22), which runs at processor speed, 
and a Main Memory Section (Figure 1 .23), which mns at 
about a 350 ns access. 

A single set associative cache is used. The size is quite 
large: 4K words are available. A 75% hit ratio with this 



size cache is estimated. 

The Main Memory Section consists of 128k of 16-bit 
words, two PAL devices for sequencing, and a 
Am2964B DRAM Controller. 

The Cache Memory consists of 4k words of 70 ns static 
RAM for data, and a 4k x 4 RAM for address tags. The 
data RAMs are accessible by byte to support byte 
operations. Control circuitry is provided to supply the 
HIT/MISS acknowledge and allows for tag and data 
updating under control of the CCU. A write-thru-cache 
scheme is employed. 

The Tag Memory is accessed for every memory 
operation. Four address bits (7,9,11,14) form the tag 
data and the rest form the address to the tag and data 
RAMS. The tag data is compared with address bits (7, 
9, 11, 14) and the HIT/MISS status is reported to the 
Cache Control Block. The upper address bits are used 
to check range validity. 

If the memory cycle is a Read and there is a Hit on the 
cache (requested data is in the cache), then cache 
data is output on the DBUS and no bulk memory cycle 
is performed. If a Miss occurs, then a bulk memory 
cycle will happen. When data is available from the bulk 
memory, it will be written into the cache as well as 
l3eing loaded into the proper data register. 

A Hit on the cache memory during a Write cycle will 
cause the data in the cache, as well as that in the bulk 
memory, to be updated. If a Miss occurs, then only 
the bulk memory will be updated. 



c 
c 



n 



S/D BUS 



21 



\7 \7 



^ 
^ 



DATA 
SECTION 



< 



Z\ 



Iz 



<> ^> 



ADDRESS 
SECTION 



DBUS 



> C 



iz 



ABUS 



:> 



Figure 1.18 The Bus Interface Unit (BlU) 
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Figure 1.21 Memory Section 
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in order to keep cache consistency, a Read byte miss 
on tlie cacfie will not cause data to be written to the 
cache. This is due to the fact that when a byte is read 
from the bulk memory, only one byte is read. A byte 
write hit on the cache will update the correct byte as 
well as the main memory byte. 



There is no special cache initialization logic. After 
power up, all that is required is to read the block of 
main memory which is covered by the cache memory. 
This makes the cache memory consistent with the 
main memory. Subsequent memory operations will 
not change this. 



PAL16L8 

MIP Processor Cache Memory Control 
Cache. pal. text 

A17,A18,A19,A20,A7,A9,A11,A14,/CWE,GND, 
/DROE,/UPDT,OAll,OA9,OA7,/HITl,OA14, /HIT, /LOCAL, VCC 

LOCAL = /A20*/A19*/A18*/A17. ; decode for local space 

; HIT,HIT1 do the 4 bit = comparison for tag matches { 16 terms ) 



HIT =/A7*/OA7*/A9*/OA9*/All*/OAll* 0A14* A14 
+/A7*/OA7*/A9*/OA9*/All*/OAll*/OA14*/A14 
+ HIT1*/0A14*/A14 
+ HITl* 0A14* A14 



*/A20*/A19*/A18*/A17 
*/A20*/A19*/A18*/A17 
*/A20*/A19*/A18*/A17 
*/A20*/A19*/A18*/A17 . 



HITl =/A7*/OA7 */A9*/0A9 * All* OAll 
+ A7* 0A7 */A9*/OA9 * All* OAll 
+/A7*/OA7 * A9* 0A9 * All* OAll 
+ A7* 0A7 * A9* 0A9 * All* OAll 
+ A7* 0A7 */A9*/OA9 * /All* /OAll 
+/A7*/OA7 * A9* 0A9 * /A11*/0A11 
+ A7* 0A7 * A9* 0A9 * /A11*/0A11. 

/0A14 = /A14 
@CWE. 

/OAll = /All 
gCWE. 

/0A9 = /A9 
@CWE. 

/0A7 = /A7 
@CWE. 

UPDT = /DROE*/HIT 
+ DROE* HIT 
+ OWE 
@ /A20*/A19*/A18*/A17. 



do 7 of 8 terms 



when CWE enable address to output 
when CWE enable address to output 
when CWE enable address to output 
when CWE enable address to output 



if read miss 

if a write hit 

hold if start write 

can update only if local access 



the following table indicates the cache update algorithm 
cache updates only occur on the local ram segment. 





HIT 


MISS 


Read word 


nc 


update 


Read byte 


no 


nc 


Write word 


update 


nc 


Write byte 


update 


nc 



E\T 



Figure 1.22A Cache PAL Device 
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The Cache PAL device signals have the following 
functions: 



(RAMSEQ) and one for decoding signals for the RAM 
array. 







LOCAL 


Indicates that the on-board 256k of 
memory being acessed 


HTf 


Indicates that a tag match has occurred 


HIT1 


Creates some of the terms required for HIT 


OA14,OA11 
OA9, OA7 


Data to/from the tag memory 


UPDT 


Indicates if an update should be done to 
the cache memory 



The Main Memory uses an Am2964B DRAM controller. 
Two PAL devices are required — one for sequencing 



A synchronous system clock at 16 MHZ is used by the 
sequencer. Since this clock is synchronized to the 
processor clock, no exotic timing is required to get 
good memory response. 

The RAMS EQ PAL d evice generates the necessary 
RAS, CAS, and MUX signals for the Am2964B DRAM 
controller. The refresh timer (74LS393) generates a 
16 [isec. clock which is used for refresh control. The 
RAMSEQ PAL device does the necessary arbitration 
between refresh and regular memory cycles. 

The RAMDCD PAL device generates some 
miscellaneous signals for the DRAM array and controls 
the Am29833 parity transceivers. Byte parity is 
checked in the DRAM array. A parity error will appear 
as an intemjpt to the processor. 
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Figure 1.23 Main Memory Section 
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PAL16R6 

MIP Processor Ram Seq & Refresh Control 
ramseq. pal. text 

CP16M,/XRQ,/REFCK, /RESET, NC,NC,NC,NC,NC, GND, 

/EN, /DAK,NC, /REF DNE, /RFSH, /TC, /MUX, /RAS, /CAS, VCC 



RAS := /RAS*/MUX*/TC*/RFSH*/RESET*XRQ 
+ RAS*/TC*/RESET 
+ RFSH*/RESET*/REF_DNE. 

MUX := RAS*/MUX*/TC*/RFSH*/RESET 
+ MUX*/TC*/RESET. 



RAS to memory controller 



; change address mux 



to ends RAS cycle and provides precharge delay between rfsh & active cycles 

; complete ras cycle 



TC := /TC*/RFSH*RAS*MUX*/RESET 
+ TC*/RFSH*XRQ*/RESET 
+ RFSH*RAS*/REF_DNE*/RESET 
+ RFSH*RAS*TC*/RESET. 



; start tc for refresh cycle 

; hold tc until refresh ras done 



rfsh active for duration of refresh cycle 

RFSH := /RAS*/MOX*/TC*/RFSH*/REF_DNE*/RESET*/XRQ*REFCK ; refresh in progress 
+ RFSH*/REF_DNE*/RESET 
+ RFSH*REF_DNE*RAS. ; keep rfsh until ras done 

indicates that refresh cycle done for this refck cycle 



REF_DNE := /REF_DNE*RFSH*TC*/RESET 
+ REF_DNE*REFCK*/RESET 
+ REF_DNE*RAS*/RESET 
+ REF_DNE*TC*/RESET 
+ REF_DNE*RFSH*/RESET. 

CAS = MUX*/RESET 

+ CAS *XRQ* /RESET. 

DAK = XRQ*TC*/RAS*CAS*/RESET 
@ XRQ*TC*/RAS*CAS*/RESET. 



refresh for this refck has been done 
until refck goes low 
remainder of refresh cycle 



CAS for memory 
dtack for memory 



Figure 1.23A RAMSEQ PAL Device 
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For RAMSEQ PAL, the signals have the following 
functions: 



RAS 


Ras timing signal to the Am2964 controller 


MUX 


Mux timing signal to the Am2964 controller 


TC 


Indicates that the RAM cycle is complete 


RFSH 


Indicates a refresh cycle is in progress 



REF DNE 



Indicates that the requested refresh has 
been done 



CAS CAS timing signal to the Am2964 controller 

DAK Data acknowledge for the dynamic RAM 
cycle 



The equations for 'RAMDCD' PAL device are shown in 
Figure 1 .23B. 



PAL16L8 

; MIP Processor error control & misc decode 
; ramdcd. pal. text for 29833 's 

/WE,NC,/XWAIT,/BYTE,/XRQ,NC,A0,/CAS,YBUS13,GND, 
/LDIOA, /WEL, /REU, /WEH, /REL,NC, /CLRERR, /WAIT, /SWAIT, VCC 



WEL = WE*XRQ*AO*BYTE 
+ WE*XRQ*/BYTE. 

WEH = WE*XRQ*/AO*BYTE 
+ WE*XRQ*/BYTE. 

REU = CAS*XRQ*/AO*BYTE*/WE 
+ CAS*XRQ*/BYTE*/WE. 

REL = CAS*XRQ*AO*BYTE*/WE 
+ CAS*XRQ*/BYTE*/WE. 

CLRERR = LDI0A*YBUS13. 

SWAIT = WAIT + XWAIT. 

END 



WE to lower ram bank 

WE to upper ram bank 

for lower byte of mem 

upper byte of mem 

parity error clear 
allow external waits 



Figure 1.23B RAMDCD PAL Device 
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The RAMDCD PAL signals have the following I/O Unit 



functions: 



WEL, WEH 


Write enables for the upper and lower RAM 
bytes 


REH, REL 


Output enables for the Am29833 
transceivers 






CLRERR 


Clears the parity latch on the Am29833 






SWAIT 


Combines a couple of WAIT signals 



The I/O unit is shown in Figure 1 .24. All registers and 
buffers are currently 8 bits wide. The I/O unit 
interfaces with the peripheral processor which does all 
of the I/O work. There are buffers in one direction and 
registers in the other direction so the transfer path can 
be pipelined (i.e. the MIP processor does not have to 
wait for a message to be read). 

There is also an 8-bit register for control signals and an 
8-bit buffer for status signals from this interface. These 
are used for signalling and synchronization of the 
peripheral devices. 
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Figure 1-24 I/O Unit 
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Chapter 2 
SOFTWARE DESCRIPTION 



2.0 OVERVIEW 

The M!P processor is designed to be used in several 
different environments requiring iiigh-speed proces- 
sing capability. The main advantage of the processor is 
that the instruction set can be tailored to the target 
system, thus achieving near ideal execution efficiency. 

The first implementation given here is an instruction 
processor for PASCAL. The Pascal compiler emits 
intermediate code called P-code. This code is based 
on the concept of a stack processor and is designed to 
be compact and is reasonably easy to generate from a 
high-level language. 

The P-code is executed by an interpreter written in 
micro-code. This intermediate code thus becomes the 
instruction set of the processor; a significant speed 
advantage is obtained over other types of processors. 
Intrinsics and other special purpose routines can be 
written in micro-code and linked with the intermediate 
code. Changes or upgrades in the machine hardware 
only mean minor modifications to the micro-code 
interpreter, not wholesale changes to a compiler. It is 
also very easy to port such a system to a number of 
different processors. 

The task of writing an interpreter for such a machine is 
not difficult but does require a good set of tools, such 
as assemblers, and trace and debug utilities. The 
authors have developed a set of tools which, when 
used in conjunction with the diagnostic pipeline 
registers, give a unified approach to micro-code 
development. These tools will work with various 
personal computers which have Pascal capability. 

As this processor is also designed to occupy minimum 
real estate, a number of trade-offs were employed so as 
to fit the processor in a small space yet retain 
performance. 

The micro-code word is designed to be 32 bits in 
length. This is generally regarded as being short, but 
good performance is still possible. By using a short 
word and eliminating some of the hardware 
surrounding micro-programmed machines, it is possible 
to make a processor which is competitive with traditional 
NMOS/CMOS processors, when compared on the 
basis of processing power vs. board real estate. Given 
twice the NMOS equivalent of real estate, one can end 
up with a machine having 8 to 10 times the 
performance of an NMOS machine. Interestingly 
enough, the comparison is also relevant when 



comparing processing power vs. cost (i.e., 10 times 
the perfomnance at twice the cost). 

The use of overlapped fields for the Am291 16 and the 
jump address field allow the micro-code to fit in 32 bits. 
The cost is about 3-4% in total performance because 
not all op-codes use Jump instructions and those that 
do are usually limited by the memory bandwidth. 

The traditional instruction mapping PROMs were 
replaced by a table mapping technique which keeps 
the vector table in the same code space as the main 
micro-code. Such a techinque costs a bit of execution 
time (approx 5-6%) but saves considerable power and 
space. It is also useful where the micro-code is 
constantly being modified as the instruction mapping 
table is now created and toaded with the micro-code. 

Although the processor is compact, a number of pipe- 
line stages, which can be used by the micro-code, 
exist within the processor. These increase the 
utilization of each component of the processor and, in 
turn, increase performance. 

The first major pipe-line stage is the memory interface. 
Here, either an instruction fetch or data store operation 
can be perfomied in parallel with Am29116 instruction 
execution. As long as there is no conflict (i.e., using a 
value from a Read Data operation before the read 
completes), the micro-code will not experience any 
Wait cycles. If there is a conflict, the processor will 
pause automatically until the corrflict is resolved. This 
is the most frequently used pipe-line stage, as there is 
often a number of micro-instoictions to be executed 
while data is being accessed. The instruction pre- 
fetch mechanism is also part of the same pipe-line. 
While the processor is using the last data value from 
the instruction register, new data is being read using 
the program counter. 

The next important stage is between the Am29116 
and the YBUS. Data may be operated on, inside the 
Am29116, while other data is being moved from 
source to destination on the YBUS. A temporary 
holding register on the YBUS may be used to delay or 
duplicate a data transfer without involving the 
Am29116. 

A third pipe-line stage exists with the Micro-program 
Sequencer. It is possible to be utilizing all of the 
previous pipe-line stages while performing certain 
types of micro-program loop or subroutine returns. 

All of these stages are used to allow efficient micro- 
program execution and help to offset some of the 
trade-offs that were made to compact the micro-code 
into 32 bits. 

A mechanism is included to altow data values, which 
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appear on the YBUS, to be used in the bit-oriented 
instructions of the Am29116 as part of the instruction. 
This allows dynamic bit-oriented instructions to be 
created. Packed field, Set operations and Floating- 
point routines use this class of instruction frequently. 

Computed Jumps and Calls are also possible by 
placing the desired data on the YBUS and doing a 
Jump or Call with the Micro-program Sequencer. 
These have the effect of reducing the overall code 
size and also help to improve performance by 
eliminating aistly chains of Test and Jump instructions. 

2.1 MICRO-CODE DEVELOPMENT 

Several micro-code assemblers are available for code 
development. These are usually nnore difficult to use 
than the assemblers that are used for standard micro- 
processor work. Several "meta" assemblers are avail- 
able to do microcode development. These are more 
difficult to work with than normal assemblers because 
they do not reflect field relationships in the target 
machine. Complexity in the assembler syntax due to 
architectural parallelism is forgivable. Less forgivable is 
the complexity that arises when each field of the 
machine language must be independently specified. 

As this project required a lot of micro-code 
development, some effort was made to streamline the 
code creation process. Inspect'ion of the published 
Am29116 instruction set shows some redundant 
information which is handled by the improved assem- 
bler. This makes the resulting source code much 
easier to write and debug. Features which are unique 
to the machine may be included as optional param- 
eters, separated by commas, after the main instruction 
code. Examples of this micro-code fonnat are shown in 
Appendix A. Streamlining of the micro-instructions 
also make it easier to upgrade the machine to larger 
word sizes and add new features when required. 

The micro-code word for this processor is 32 bits long, 
which is relatively short for a micro-coded machine. 
There is one overlapped 16-bit instruction field, shared 
between the Am29116 instruction and the Jump 
address field. This means that there are two instruction 
formats. The first is a data type instruction and will 
involve some action by the Am29116 and optional 
operations by other data elements of the processor. 
The second instruction type is a Control instruction. 
Here, micro-instruction control will conditionally change 
(eg. JUMP, CALL, RTS). Each micro-instruction has 6 
other fields which occupy the remaining 16 bits micro- 
store width. They are used to specify data path (YBUS) 
source and destination, memory control, sequencer 
control, and flag register updates. All of the fields must 
be defined for each word. To make the task of writing 
micro-code easier, the micro-assembler uses a default 
micro-instruction word. This instruction is initialized at 



the beginning of each source line and then modified 
by operands on the source line. If written out, the 
default control word would appear as: 

NOOP CONT, SRE, IE, NYS, NYD, NSR 

This sequence specifies a NOP instnjction to the 
Am29116, a continue to the Am2910 sequencer, 
status flag update, Am29116 instruction enable, no 
YBUS destination, no YBUS source, and no store 
control operation. 

Any name which effects control over the YBUS, store 
control, micro-sequencer, or status flag updates, may 
be entered in free field form after ail the required oper- 
ands for the main instruction. These will be denoted 
optional parameters in the following descriptions. 

Any Am29116 instruction mnemonic which contains 
an T requires an immediate data operand following the 
Am291 1 6 instruction descriptor. 

The micro-assembler performs a number of syntax 
checks on the generated code to detect invalid 
instruction combinations, illegal instruction sequen- 
ces, and missing operands. 

2.2 OP-CODE EXECUTION 

Creation of an interpreter for a micro-coded 
intermediate code machine is quite straightforward as 
there are several good descriptions of the inter- 
mediate object code in print. 

The basic concept of a intermediate code processor is 
that of a stack machine. A number of registers point to 
various constructs of this stack machine such as tocal 
and global data frames, the heap and the stack. The 
various op-codes move data values between these 
frames and the stack, and operate on stack elements. 

All the registers in the hypothetical stack machine are 
contained within the register file of the Am29116, 
leaving 24 work registers for op-codes and intrinsic 
procedures to use. 

This processor has no dedicated hardware to maintain 
the stack. This may appear to be a short-coming of the 
processor, but it is not serious. The presence of the 
cache memory means that most of the stack elements 
are available without a Wait, as they are usually the 
most frequently referenced items. To reduce the 
number of memory references to the top of stack 
element, it is kept within the register file. This also 
allows a form of pipe-lining to be done during op-code 
execution. A side effect is that stack reads and writes 
now get done during a different portion of the 
execution of the op-code. This reduces the peak rate 
of demand on the memory system. 
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The instruction stream for the intermediate code is 
byte-oriented. Within the processor is a two-byte 
instmction prefetch queue, filled on demand by the 
Bus Sequencer. The micro-code is arranged so that 
there is an instruction jump table starting at FOO (hex). 
As there are 256 op-codes, this table occupies the last 
256 bytes of memory. During an op-code fetch, the 
instruction byte pointed to by the program counter is 
enabled onto the lower byte of the YBUS. The high 
byte is forced to 1's by pull-up resistors. The 
microprogram sequencer is told to do an unconditional 
jump to the address on the YBUS and so ends up in 
the instruction vector table. This table is comprised of 
jumps to the individual op-code routines. A high 
percentage of instructions only use one or two bytes 
of instruction so the pre-fetch queue works quite well. 

In the event of an interrupt, the next byte of the 
instruction queue is not enabled onto the YBUS. 
Instead, bit 8 of the YBUS is forced Low. The resulting 
Jump is to FEF (hex) which contains a Ju mp to th e 
interoipt sevice routine. (YBUS bit 8 is tied to IVECT in 

nnoo I L rAi_.) 



Several examples of micro-code op-codes are shown 
in Appendix A. The total amount of micro-code for ail 
op-codes and intrinsics is approximately 2.5 k words. 
With a 4 k micro-store, this leaves sufficient room for 
new op-codes and intrinsic procedures. 

On average each op-codes takes about 8 micro-code 
instructions, one of which is the jump to next 
instruction. Some subroutines are used to keep the 
code compact. 

2.3 MICRO-CODE DEBUG TOOLS 

Normally, to debug the hardware and micro-code of a 
machine such as this would require a special develop- 
ment station. The use of diagnostic registers on critical 
parts of the MIP processor allows the use of much 
simpler hardware. A small personal computer such as 
an APPLE II or a PC, with Pascal language capability 
can be programmed to act as a development station or 
debug tool. Currently a 68000 based system is being 
used. 

Access to the MIP processor diagnostic connector 
requires a TTL level I/O port with 9 output pins and 1 
input pin. The output bits coukJ just be registered, 
although faster port operation would occur if 7 were 
registered and 2 were pulse outputs (DCLK and 
STEP). 



A Pascal program executing on the workstation 
provides access to all of the processor registers, the 
writeable control store, and the main memory. This 
same wori<station is used to edit and assemble the 
micro-code for the processor underdevelopment. 

The debug tool is menu driven. The display normally 
shows ail the registers of the processor. When a 
command is entered, the action is performed and the 
display updated. This is suitable for single-stepping 
through micro-code or changing register values. 
Branches to new sections of code can also be done. 

Loading of the writeable control store is done by 
specifying an object file (which is located on the 
wori<station disk) to be loaded into the writable control 
store. 

Code and data files may also be loaded into the main 
memory in a similar manner. Commands exist to 
display a block of 128 main memory bytes at once, as 
well as change single bytes or words. 

Breakpoints may be specified for the micro-code. 
Execution of the code will progress until the 
breakpoint is encountered and the display will be 
updated. This mode of execution is not done in real 
time, however, as the wori<station checks the micro- 
address after each instruction. This mode is quite 
useful for debugging most codes. Other techniques 
such as scope loops and computation loops can be 
easily implemented to isolate timing problems. 

2.4 MICROCODE DETAILS 

Microcode field definitions for SMIP 

The 32-bit microword is divided up into fields as shown 
below: 



MSD31-28 


Am291 OA sequencer control 


MSD27-25 


memory control 


MSD24 


Am29116[EN 


MSD23 


Am29116SRE 


MSD22-20 


YBUS source 


MSD19-16 


YBUS destination 


MSD15-0 


Am291 1 6 instruction or 
immediate data 
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Am29116 Instructions 



single operand format 



MOVE 


SORA 




ram 


to ace R 


am Na 


me<, op 


tional 


COMP 


SORY 




ram 


to y 








INC 


SORS 




ram 


to status 








NEG 


SOAR 




ace 


to ram 










SODR 




d to ram 










SOIR 




id to ram 










SOZR 




to ram 










SOZER 




d(oe) to ram 










SOSER 




d(se) to ram 










SORR 




ram 


to ram 










SOA 




• ace 


-> ? NRY 


; ybu 


s 




SOD 




• d 


-> ? NRA 


; aec 






SOI 




• i 


-> ? NRS 


; status 




SOZ 




; 


-> ? NRAS 


; ace 


, status 




SOZE 




; d{oe) 










SOSE 




; d(se) ; 








; two 


operand 


inst 


:ruct 


ions R 


S 


D 


SUBR 


; s - 


r 




TORAA 


ram 


aec 


ace 


SUBRC 


; s - 


r - 


c 


TORIA 


ram 


i 


aec 


SUBS 


; r - 


s 




TODRA 


d 


ram 


ace 


SUBSC 


; r - 


s - 


c 


TORAY 


ram 


aec 


y 


ADD 


; r + 


s 




TORIY 


ram 


i 


Y 


ADDC 


; r + 


s + 


c 


TODRY 


d 


ram 


Y 


AND 


; r and s 




TORAR 


ram 


aec 


ram 


NAND 


; r nand s 




TORIR 


ram 


i 


ram 


EXOR 


; r xor s 




TODRR 


d 


ram 


ram 


NOR 


; r nor s 




TODAR 


• d 


aec 


ram 


OR 


; r or s 




TOAIR 


• ace 


i 


ram 


EXNOR 


; r xnor e 




TODIR 


• d 


j_ 


ram. 










TODA 


• d 


ace 


•p 










TOAI 


• aec 


i 


7 










TOD I 


• d 


i 


•J 



single bit shifts 



SHUPZ 


; up 


SHRR 


SHUPl 


; up 1 


SHDR 


SHUPL 


; up qlink 




SHDNZ 


; down 




SHDNl 


; down 1 


SHA 


SHDNL 


; down qlink 


SHD 


SHDNC 


; down qc 




SHDNOV 


; down qn xor 


qovr 



bit oriented instructions 



SETNR 

RSTNR 

TSTNR 

LD2NR 

LDC2NR 

A2NR 

S2NR 

TSTNA 

RSTNA 

SETNA 

A2NA 



set ram bit n (n*512) 
reset ram bit n 
test ram bit n 
2**n -> ram (n*512) 
comp(2**n) -> ram 
ram + 2**n -> ram 
ram - 2**n -> ram 
test ace bit n 
reset ace bit n 
set ace bit n 
aec + 2**n -> aec 



<, optional parms> 



ram 
d 



aec 
d 



ram 
ram 



Ram Name<, optional parms> 



Ram Name<,optionalparms> 



NRY or NRA<, optional parms> 



Bit#,Ram Name<, optional parms> 
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S2NA 


ace - 


2**n - 


> 


ace 


LD2NA ; 


2**n - 


-> ace 






LDC2NA ; 


comp(2**n) - 


> 


ace 


TSTND 


test d bit r 






RSTND 


reset 


d bit 


n 




SETND 


set d 


bit n 






A2NDY ; 


d + 2**n -> 


y 




S2NDY ; 


d - 2**n -> 


y 




LD2NY ; 


2**n - 


-> y 






LDC2NY ; 


comp{2**n) - 


-> 


y 


; rotate 


by n bits 






i- 




d 






RTRA 


ram 


ace 






RTRY 


ram 


y 






RTRR 


ram 


ram 






RTAR 


ace 


ram 






RTDR 


d 


ram 






RTDY 


d 


y 






RTDA 


d 


ace 






RTAY 


ace 


y 






RTAA 


ace 


ace 






; rotate 


and merge 






\ 


1 


r/d 




s 


MDAI 


d 


ace 




i 


MDAR 


d 


ace 


ram 


MDRI 


• d 


ram 




i 


MORA 


• d 


ram 




ace 


MARI 


• ace 


ram 




i 


MRAI 


; ram 


ace 




i 



rotate and compare 



CDAI 


d 


ace 


i 


CDRI 


d 


ram 


1 


CDRA 


d 


ram 


aec 


CRAI 


ram 


ace 


1 



crc instruction 



CRCF ; ere forward 
CRCR ; crc reverse 



status bit instructions 



SETST 
RSTST 



set bit 
reset bit 



ONCZ 

L 
Fl 
F2 
F3 



Bit#,<optional parms> 



Bit#,Ram Name<, optional parms> 



Bit#<, optional parms> 



Bit#, Immediate Data<, optional parms> 

Bit#, Register name<, optional parms> 

Bit#, Register name, Immediate Data<, optional parms> 

Bit#, Register name, <, optional parms> 

Bit#, Register name, Immediate Data<, optional parms> 

Bit#, Register name, Immediate Data<, optional parms> 



Bit#, Immediate Data<, optional parms> 

Bit#, Register name, Immediate Data<, optional parms> 

Bit#, Register name<, optional parras> 

Bit#, Register name, Immediate Data<, optional parms> 



Register Name<, optional parms> 
Register Name<, optional parms> 



OVR,N,C,Z<, optional parms> 

link 

fl 

f2 

f3 



SVSTR ; save status in ram 
SVSTNR ; save status in NRY 

conditional jumps and calls 



Register name<, optional parms> 
<optional parms> 



JUMP ; cond jump Condition, address<, optional parms> 
CALL ; cond call Condition, address<, optional parms> 
CRET ; cond return Condition<, optional parms> 



condition codes for Jump, Call & CRET 
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CT 

Z 

C 

N 

OVR 

UNC 

NCT 

NZ 

NC 

P 

NOVR 



latched CT from 29116 
latched Zero flag 
latched Carry flag 
latched Sign 
latched Overflow 
unconditional 
latched /CT 
latched not Zero 
latched not Carry- 
latched Positive 
latched not Overflow 



NOOP ; nop 

Test condition instructions 



TNOZ 


(N xor OVR) 


TNO 


(N xor OVR) 


TZ 


Zero 


TOVR 


OVR 


TLOW 


low 


TC 


C 


TZC 


Z + /C 


TN 


N 


TL 


link 


TFl 


fl 


TF2 


f2 


TF3 


f3 


; priori 


tize 



<optional parms> 



<optional parms> 



PRTXYZ <ram namex. Immediate datax, optional parameters> 
where x = source 

y = mask 

z = destination 



R 


= 


ram 




A 


= 


accumulator 


D 


= 


d inputs 




I 


= 


immediate 


data 



Z = zero 

NR = no ram destination 



PRTARA 
PRTARY 
PRTARR 
PRTRAA 
PRTRZA 
PRTRIA 
PRTRAY 
PRTRZY 
PRTRIY 
PRTRAR 
PRTRZR 
PRTRIR 
PRTAAR 
PRTAZR 
PRTAIR 
PRTDAR 
PRTDZR 
PRTDIR 

PRTNRAA 
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PRTNRAZ 




PRTNRAI 




PRTNRDA 




PRTNRDZ 




PRTNRDI 






29116 Internal Register names { internal RAM on 29115 } 


DO 


- DO 


Dl 


1 - Dl 


D2 


2 - D2 


D3 


3 - D3 


D4 


4 - D4 


D5 


5 - D5 


D6 


6 - D6 


D7 


7 - D7 


MP 


8 - MP local pointer 


BP 


9 - BP base pointer 


NP 


10 - NP heap pointer 


SP 


11 - SP stack pointer 


IPC 


12 - IPC temporary program counter 


SEGP 


13 - SEGP segment pointer 


JTAB 


14 - JTAB proc pointer 


TOS 


15 - top of stack element 


RO 




Rl 




R2 




R3 




R4 




R5 




R6 




R7 




R8 




R9 




RIO 




Rll 




R12 




R13 




R14 




R15 






; The fo 


llowing field names are optional in any micro-insruction 
and are denoted by <optional parms> in the above list 

Sequencer control sc[0..3] 


JZ 


; jump to address 


CJS 


; conditional jsr via PL 


JMAP 


; jump to address via MAP 


CJP 


; jump to address via PL 


PUSH 


; push stack and cond load counter 


JSRP 


; jsr via R or PL 


CJV 


; cond jump to VECT 


JRP 


; jump to R or PL 


RFCT 


; repeat loop if CT <> 


RPCT 


; repeat PL if CT <> 


CRTS 


; conditional return 


CJPP 


; conditional jump to PL and pop stack 


LDCT 


; load CT 


LOOP 


; test end of loop 


CONT 


; continue 


TWB 


; three way branch 


; YBUS s 


ources 
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OEY 

DRS 

YREG 

lOD 

IR 

OEPM 

OEPL 

NYS 



29116 output enable Y 

data register 

YBUS data capture reg 

I/O data bus 

instruction reg 

29517 msp product enable 

29517 Isp product enable 

no Ybus source 



YBUS destinations 



IOC 

DR 

PC 

AR 

DLE 

PG 

PCPG 

FMT 

CTEN 

ENY 

ENX 

ICR 

ABEN 

PGEN 

REG 

YSHFT 

NYD 



I/O control 

data register 

program counter 

address register 

29116 data latch 

address page 

program status word 

formats reg 

condition code enable 

multiplier y input 

multiplier x input 

interrupt control dest 

address reg readback 

page reg readback 

2910 register/counter 

enable dynamic bit operations 

no Ybus destination 



; Memory Control codes 

NSR ; - no store request 

RPS ; read program store 

CRPS ; cond read program store 

WIO ; wait for memory io to fini 

RDS ; read data store 

WDS ; write data store 

RDSB ; read data store byte 

WDSB ; write data store byte 

; Status & Instruction control 

NSE ; status load disable 

NIE ; 29116 instruction disable 

SRE ; status update enable 
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Chapter 3 
PERFORMANCE 



The primary objective, when building the MIP 
Processor, was to achieve an order of magnitude of 
increase, in performance, over a standard workstation 
(68000 based) in the execution of HLL (High Level 
Language) benchmarks; this is all based on the 
intermediate code concept of PASCAL. 

So far (July/85), a performance advantage of 8.25 in a 
linear weighting of op-codes has been observed. 
Most op-codes see a ten-fold, or better, advantage, 
therefore, with some fine-tuning of certain op-codes, 
an advantage of 1 is expected. 

A cache hit rate of 75% was estimated and 
measurements showed a hit rate of 71% on the linear 
weighting test. 

3.0 PRELIMINARY SURVEY 

A number of benchmarks are available for performance 
evaluation. 

The most notable one is the BYTE Sieve benchmark. 
A large body of data has been collected for this test. It 
mainly tests array accessing and logic test capability of 
the machine. It is included here because it is easy to 
do and every one else does it. 

A linear weighting of op-codes is quite useful for 
comparing the performances of two machines using 
the same upper level software. It allows for a 
quantitative measure of specific features of the new 
machine. When combined with the op-code run time 
frequency of occurrence, the specific benefit of a new 
feature can be evaluated. 

Because the machine is intended to be used in a 
workstation environment, compilation speed is also an 
important bench mark. 

From time to time a workstation processor will perform a 
number of operations which are based on numerical 
algorithms. Of a large class of signal processing and 
statistical routines, the FFT is representative. 

Another often quoted benchmark is the Whetstone 
benchmark. It uses floating point arithmetic quite 
heavily. At this point the floating point micro-code has 
yet to be completed, therefore, this benchmark test 
has not been performed. 

In the following discussions of the benchmarks, all the 
timings were measured by using the self timing 



capabilities of the machines. The machines ail have 
access to a real time clock with a resolution of 16.6 ms. 
Sufficient loops of each test were ran to bring the 
timing resolution to within 1 ms. 

3.1 COMPILATION SPEED 

The compilation speed of the 68000 used is in the 
range of 900 to 1500 lines per minute. A typical value 
for compiling the compiler is 1150 lines per minute; 
the MIP processor can do this 6.5 times faster (7500 
Ipm). 

3.2 SIEVE OF EROSTHANES 

The 68000 bench ran at 78 seconds, which reduced 
to 64 seconds, at best if Wait States are removed. The 
MIP sieve ran at 5.6 seconds. If Wait States are 
rerrxjved from the MIP, better performance can be 
expected. 

Recently, 'streamlining' the Sieve bench has been 
done to take advantage of certain particular 
environments. There was one 'improved' version 
written in C that takes advantage of register coercion. 
The authors have no objection to that, but readers 
should consider the following: If the Sieve is rewritten 
as a microcode routine in the MIP, then the following 
program is the benchmark: 

Begin 

Sieve 
end; 

This program runs in approximately .14 seconds. 

Based on information in the BYTE article, an 68000 
assembly language version had a performance time of 
1.12 seconds (which was the fastest time quoted of 
any example). A microcoded sieve on the MIP still mns 
8 times faster. 



This illustrates the power of a microcoded approach. In 
practice, program bottlenecks are moved into 
microcode as they are encountered. Quite often 
significant system performance enhancements can be 
made by the addition of a few small intrinsic functions. 

3.3 FFT 

A micro-coded FFT routine was also created. It shows 
a dramatic increase over a similar program written for 
the 68000. There is a tendency to compare the micro- 
coded machine with special purpose FFT processors. 
This machine does not have the dual ALU's, dual 
memory banks, and dedicated address and coefficient 
ROMs normally associated with such a machine, so, as 
expected, it is about 6 to 8 times slower than an FFT 
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processor. As a general purpose processor, it does do 
a 1 K complex FFT in 35 ms. This is a factor of 1 5 better 
than the 68000. 

3.4 ANALYSIS AND CONCLUSIONS 

The MIP processor has proven to be an effective 
vehicle for demonstrating the power and flexibility of a 
micro-coded machine. The P-code system has an 
overall performance improvement of 6 to 7, over a 
68000 based system. While this is not as great as 
originally hoped, it remains a significant amount. 

Micro-coded intrinsic functions do experience a 
greater performance improvement factor than a 68000 
assembly intrinsics. This comparative speed-up is due 
to the instruction set of the Am29116 and the pipeline 
stages of the processor. 

The small single set cache memory used is responsible 
for a 23% performance improvement when executing 
P-code programs. The hit rate averages 71%. These 
data are consistent with other single set cache 
memories that have been used on other processors 
and reported in the literature. 

The processor is executing P-code with an effective 
processor utilization of 73%. Memory bus utilization is 
approximately 44%. Obviously a good system improve- 
ment could be made if these figures were closer to 
100%. There are a couple of ways to do this. As each 
P-code is luned', the number of processor cycles is 
reduced. Careful inspection of a P-(X)de often allows 
the memory references to be arranged so as not to 
cause Wait States to occur; this improves processor 
utilization. Reducing the number of processor cycles 
per op-code increases memory utilization. This is 
because the number of memory references per P- 
code is constant. 

Minor changes to the definition of the P-codes can 
also increase system perfomnance. This was not done 
because that part of the system was kept constant to 
compare with other machines. 

Probably the single largest improvement on the 



processor would be a more complex BlU. Such a unit 
would have a dedicated stack address and data 
register. It was not feasible to include this in the 
original design due to a limitation of board real estate. 
There are a number of multi-port register files (5 or 6 
port) available now which would allow the entire BlU to 
be reduced in size with increased functionality. This 
could be done without increasing the micro-code 
width. Some machines have added a dedicated stack 
area. This limits the stack to a fixed size and location 
which can cause problems. With a stack address and 
data register working into a cache based memory 
system, the delays due to a quantity not being 
available in the stack register are minimal. The 
processor would have access to the top two items of 
the stack with no delay at all (TOS inside the 
Am291 1 6, TOS-1 in the stack data register). 

The 2 byte pre-fetch mechanism appears to work well. 
The average P-code is less than 2 bytes. If the bus 
utilization were very high (>90%), it may be necessary 
to have more than two bytes pre-fetched so as to 
minimize op-code waits within a P-code. 

As a single board processor, this design is very 
effective from a performance point of view. The 
standard functionality offered by a 68000, or other 
such processor, is available, with the added ability to 
have micro-coded intrinsic functions. The effort to 
create a micro-coded intrinsic function is the same as 
writing assembly level routines for a 68000 (in fact, it is 
often easier due to the diverse nature of the Am291 16 
micro-code instructions). 

The 4k size of the micro-store is adequate to allow the 
coding of a high-level intemiediate code such as P- 
code and allows ample room (1.5k) for intrinsic 
functions. 

The technique of using the 29818 diagnostic registers 
for trace and debug of the processor is effective. A 
program written in Pascal performs all functions 
required to initialize, load, and test the processor. This 
diagnostic program can be transported to virtually any 
workstation. A simple port gives access to the 
processor undertest. 
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Appendix A 








Sample Micro-Code 




0017 1 


SLDLX 


STACKW 


TOS 


load ith local word to stack 


0017 1 EE81 D850 


# 


MOVE 


SORY,TOS,OEY,DR,NSE,WIO 




00181 EC83 C3EB 


# 


S2NR 


1,SP,0EY,AR,WDS,NSE 




00191 E07F E481 E07F 


FF2E 


ADD 


T0AI,NRA,«MSLCL/2>-215> , 


base value + offset 


OOIBI EOOF ECOl 




SHUPZ 


SHA,NRA,OEY 


* 2 


OOICI EA03 9088 




ADD 


TORAY, MP, OEY, AR, RDS 


add mp and do read 


OOIDI EOIF D8D0 




MOVE 


SODR,TOS,DRS 


wait for read to complete 


OOIEI 




IFETCH 






OOIEI 244F F921 


# 


MOVE 


SOZE, NRA, IR, CRPS, JMAP 




004FI 


LDO 


STACKW 


TOS 


load local with offset B 


004FI EE81 D850 


# 


MOVE 


SORY,TOS,OEY,DR,NSE,WIO 




00501 EC83 C3EB 


# 


S2NR 


1,SP,0EY,AR,WDS,NSE 




00511 




GBIG 






00511 E44F D946 


# 


MOVE 


S0SER,D6,IR,CRPS 




00521 61F8 **** 


# 


JUMP 


P,$l 




00531 E07F 91E6 


# 


RTRR 


8,D6 




0054 1 E07F FFC6 


# 


RSTNR 


15, D6 




00551 E44F 58C6 


# 


MOVE.B 


S0DR,D6,IR,CRPS 




00561 


#$1 








Q056I E07F 8486 E07F 


0005 


ADDTORIA,D6,<MSLCL/2> 




00581 E07F ECOl 




SHUPZ 


SHA,NRA 


• * 2 


00591 EA03 9089 




ADD 


TORAY, BP , OEY, AR, RDS 


• add bp and do read 


005A1 EOIF D8D0 




MOVE 


SODR,TOS,DRS 




005BI 




IFETCH 






005B1 244F F921 


# 


MOVE 


SOZE, NRA, IR, CRPS, JMAP 




005C1 


LAO 


STACKW 


TOS 


; load address of B'th local 


005C1 EE81 D850 


# 


MOVE 


SORY,TOS,OEY,DR,NSE,WIO 




005DI EC83 C3EB 


# 


S2NR 


1,SP,0EY,AR,WDS,NSE 




005E1 GBIG 










005E1 E44F D946 


# 


MOVE 


S0SER,D6,IR,CRPS 




005F1 61F8 **** 


# 


JUMP 


P,$l 




00601 E07F 91E6 


# 


RTRR 


8,D.6 




00611 E07F FFC6 


# 


RSTNR 


15, D6 




00621 E44F 58C6 


# 


MOVE.B 


S0DR,D6,IR,CRPS 




00631 


#$1 








00631 E07F 8486 E07F 


0005 


ADD 


TORIA,D6,<MSLCL/2> 




00651 E07F ECOl 




SHUPZ 


SHA, NRA 


; * 2 


00661 EOOF 8089 




ADD 


TORAA,BP,OEY 




00671 EOOF D890 




MOVE 


SOAR, TOS, OEY 




00681 




IFETCH 






00681 244F F921 


# 


MOVE 


SOZE,NRA,IR, CRPS, JMAP 




009C1 


LDC 


STACKW 


TOS 




009CI EE81 D850 


# 


MOVE 


SORY,TOS,OEY,DR,NSE,WIO 




009DI EC83 C3EB 


# 


S2NR 


1,SP,0EY,AR,WDS,NSE 




009EI E44F D931 




MOVE 


S0ZER,R1,IR,CRPS 


; get length of block 


009F1 EE7F 7140 




NOOP 


WIO 




OOAOI E07C 7140 




NOOP 


ABEN 




OOAll E07C E190 




TSTND 


0,ABEN 


; test LSB of PC 


00A21 61F8 **** 




JUMP 


Z,$l 


; if word aligned 


00A31 E44F 7140 




NOOP 


IR,CRPS 


; dump odd byte 


00A4 1 E44F D920 


$1 


MOVE 


SOZER,D0, IR,CRPS 


; may want flip 


00A51 E07F 91E0 




RTRR 


8, DO 




00A61 E44F 58C0 




MOVE.B 


SODR,D0,IR,CRPS 




1 00A7I 




STACKW 


DO 


; move 1 word 



A-1 



00A7 


EE81 


D840 


# 


MOVE 


SORY,D0,OEY,DR,NSE,WIO 




00A8 


EC83 


C3EB 


# 


S2NR 


1,SP,0EY,AR,WDS,NSE 




00A9 


E07F 


ClFl 




S2NR 


0,R1 




OOAA 


61F8 


9F5B 




JUMP 


NZ,$1 


; loop for all 


OOAB 








STACKR 






OOAB 


EA83 


D96B 


# 


MOVE 


SORR,SP,OEY,AR,NSE,RDS 




00 AC 


EOFF 


C3CB 


# 


A2NR 


1,SP,NSE 




00 AD 


EOIF 


D8D0 




MOVE 


SODR,TOS,DRS 




OOAE 








IFETCH 






GOAE 


244F 


F921 


# 


MOVE 


SOZE,NRA, IR,CRPS, JMAP 




OOCB 






STO 


STACKR 


; read address 




OOCB 


EA83 


D96B 


# 


MOVE 


SORR, SP, OEY, AR, NSE, RDS 




OOCC 


EOFF 


C3CB 


# 


A2NR 


1,SP,NSE 




OOCD 


EEOl 


D970 




MOVE 


SORR, TOS, OEY, DR 


; data 


OOCE 


EC13 


7140 




NOOP 


DRS,DR,AR,WDS 


; write it 


OOCF 








STACKR 






OOCF 


EA83 


D96B 


# 


MOVE 


SORR, SP , OEY, AR, NSE, RDS 




OODO 


EOFF 


C3CB 


# 


A2NR 


1,SP,NSE 




OODl 


EOIF 


D8D0 




MOVE 


SODR,TOS,DRS 




00D2 








IFETCH 






00D2 


244F 


F921 


# 


MOVE 


SOZE,NRA, IR, CRPS, JMAP 




00D3 


EOOF 


E441 EOOF 


00F8 SIND)SUBS 


TOAI,NRA,248,OEY 


; adjust offset 


00D5 


EOOF 


ECOl 




SHUPZ 


SHA,NRA,OEY 




00D6 


EA03 


9090 




ADD 


TORAY,TOS,OEY,AR,RDS 


; get the data 


00D7 


EOIF 


D8D0 




MOVE 


SODR,TOS,DRS 




00D8 








IFETCH 






00D8 


244F 


F921 


# 


MOVE 


SOZE,NRA, IR, CRPS, JMAP 




OOFl 






IXA 


STACKR 


; get base 




OOFl 


EA83 


D96B 


# 


MOVE 


SORR, SP, OEY, AR, NSE, RDS 




00F2 


EOFF 


C3CB 


# 


A2NR 


1,SP,NSE 




OOFS 


11F8 


7FED 




CALL 


UNCGETBIG 


; get element size 


00F4 


EOOA 


CC06 




SHUPZ 


SHRR,D6,0EY,ENX 


; * 2 to mult 


OOFS 


E009 


D970 




MOVE 


SORR, TOS, OEY, ENY 


; * element size 


00F6 


EOOF 


7140 




NOOP 


OEY 




00F7 


E06F 


F8C1 




MOVE 


SOD,NRA,OEPL 




00F8 


EOIF 


C290 




ADD 


TODAR,TOS,DRS 


; + base => tos 


00F9 








IFETCH 






00F9 


244F 


F921 


# 


MOVE 


SOZE,NRA,IR, CRPS, JMAP 




018D 






ADI 


STACKR 




; add TOS-1 and TOS 


018D 


EA83 


D96B 


# 


MOVE 


SORR, SP, OEY, AR, NSE, RDS 




018E 


EOFF 


C3CB 


# 


A2NR 


1,SP,NSE 




018F 


EOIF 


9E90 




ADD 


TODRR,TOS,DRS 




0190 


IFETCH 










0190 


244F 


F921 


# 


MOVE 


SOZE,NRA,IR, CRPS, JMAP 




01F9 






LESI 


STACKR 






01F9 


EA83 


D96B 


# 


MOVE 


SORR, SP, OEY, AR, NSE, RDS 




OlFA 


EOFF 


C3CB 


# 


A2NR 


1,SP,NSE 




OlFB 


EOIF 


9E50 




SUBS 


TODRR,TOS,DRS 




OlFC 


EOOF 


7342 




TNO 


OEY 


; test for LT 


OlFD 


E08F 


D910 




MOVE 


SOZR,TOS,OEY,NSE 




OlFE 


61F8 


**** 




JUMP 


NCT,$1 




OlFF 


EOOF 


CIFO 




S2NR 


0, TOS, OEY 




0200 






$1 


IFETCH 






0200 


244F 


F921 


# 


MOVE 


SOZE, NRA, IR, CRPS, JMAP 




044F 


E44F 


D946 


UJP 


MOVE 


S0SER,D6,IR,CRPS 


; get jump code 


0450 






; 








0450 


61F8 


**** 


JMP 


JUMP 


N,JTABJMP 


; if neg use jtab 


0451 








READ_PC 







A-2 



092AI 






092AI 






092AI 






092AI 






092AI 






092AI 






092Ai 






092AI 






092A1 






092AI 






092AI 






092AI 






092A1 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 






092A 


EA03 


C5CB 


092B 


EOIF 


D8DB 


092C 


EA03 


C3CB 


092D 


EOIF 


D8D2 


092E 


EA03 


C3CB 


092F 


EOIF 


D8D1 


0930 


EA03 


C3CB 


0931 


EOIF 


D8D7 


0932 






0932 


E07F 


D918 


0933 


EOIE 


E1B8 


0934 


EOOF 


CC18 


0935 


E02F 


D8D3 


0936 


EOOF 


D915 


0937 


E02F 


DCD6 


0938 


11F8 


**** 


0939 


E07F 


F904 


093A 






093A 


EOOF 


7356 


093B 


11F8 


**** 


093C 






093C 


E07F 


D914 


093D 


E07F 


D816 


093E 


E07F 


D887 


093F 


E07F 


D811 



FFT ROUTINE 



micro-coded fft routine for mip processor 



register assignments 



DO 
Dl 
D2 
D3 
D4 
D5 
D6 
D7 

Rl 

R2 

R3 

R4 

R5 

R6 

R7 

R8 

R9 

RIO 

Rll 

R12 



-. s . r 

- s . i 

- temp 

- scale check 

- w.r 

- w. i 

- minor loop 

- major loop 

- original data array pointer 

- sin/cos table 

- offset 

- bigstep 

- scale count 

- big limit 

- pass count 

- number of points 

- working data pointer to A component 

- working data pointer to B component 

- shuffle array 

- check pointer 



$1 



$2 



.MACRO 
MOVE 
JUMP 
NEG 
S2NR 
JUMP 
SETST 

.ENDM 



RCHEK 

S0DR,D3,YREG 

P,$l 

S0DR,D3 

14,D3,0EY 

NC,$2 

F3 



FLUTERBY 


A2NR 


2,SP,0EY,AR,RDS 




MOVE 


S0DR,R11,DRS 




A2NR 


1,SP,0EY,AR,RDS 




MOVE 


S0DR,R2,DRS 




A2NR 


1,SP,0EY,AR,RDS 




MOVE 


S0DR,R1,DRS 




A2NR 


1,SP,0EY,AR,RDS 




MOVE 


S0DR,R7,DRS 




MOVE 


S0ZR,R8 




SETNR 


0,R8,DRS,YSHFT 




SHUPZ 


SHRR,R8,0EY 




MOVE 


S0DR,R3,YREG 




MOVE 


S0ZR,R5,0EY 




INC 


S0DR,R6,YREG 




CALL 


UNCRNGCHEK 




MOVE 


SOZ,NRS 


FFTL 


TF3 


OEY 




CALL 


CT, SCALE 




MOVE 


S0ZR,R4 




MOVE 


S0RA,R6 




MOVE 


SOAR, D 7 




MOVE 


S0RA,R1 



macro to do range check on data 

set flag 3 for scaling 

shuffle pointer 
sin/cos table 
data pointer 
pass count 



number of points 
offset = 1/2 byte count 
zero scale count 
big limit =1 to start 



; scale the data 
; bigstep 
; outer loop 



A-4 



04511 


EE7F 7140 


# 


NOOP 


WIO 




0452 1 


E07C 7140 


# 


NOOP 


ABEN 




04531 


E07C F8C1 


# 


MOVE 


SOD,NRA,ABEN 




04541 


EE02 8086 




ADD 


T0RAA,D6,0EY,PC 


; re-load pc & fetch 


04551 


E206 F900 




MOVE 


SOZ , NRY, OEY, PCPG, RPS 




04561 






IFETCH 






04561 


244F F921 


# 


MOVE 


SOZE,NRA, IR, CRPS, JMAP 




0457 1 




; 








0457 1 


EOOF D80E 


JTABJMP MOVE 


SORA,JTAB,OEY 




04581 


EA03 8086 




ADD 


TORAA, D6 , OEY, AR, RDS 




04591 


EOIF E201 




SUBR 


TODA,NRA,DRS 


; self-relative 


045AI 


EE02 F880 




MOVE 


SOA,NRY,OEY,PC 


; load PC and fetch 


045BI 


E206 F900 




MOVE 


SOZ , NRY, OEY, PCPG, RPS 




045CI 






IFETCH 






045C1 


244F F921 


# 


MOVE 


SOZE,NRA,IR, CRPS, JMAP 




045DI 












045DI 




; FJP 


S; 


JUMP IF TOS IS FALSE. 




045DI 












045DI 




FJP 


STACKR 






045DI 


EA83 D96B 


# 


MOVE 


SORR,SP,OEY,AR,NSE,RDS 




045EI 


EOFF C3CB 


# 


A2NR 


1,SP,NSE 




045FI 


E07F EIFO 




TSTNR 


0,TOS 




04601 


E09F D8D0 




MOVE 


SODR,TOS,DRS,NSE 


; refresh TOS 


04611 


61F8 IBBO 




JUMP 


Z,UJP 




04621 


E44F D926 


NOJ 


MOVE 


S0ZER,D6,IR,CRPS 


; dump jump byte 


0463 1 






IFETCH 






04631 


244F F921 


# 


MOVE 


SOZE, NRA, IR, CRPS, JMAP 




0474 1 


EE7F 7140 


XJP 


NOOP 


WIO 




0475 1 


E07C 7140 




NOOP 


ABEN 




04761 


E07C D8CC 




MOVE 


SODR, IPC, ABEN 


; copy of PC 


0477 1 


EOOF ElEC 




TSTNR 


0, IPC, OEY 


; align PC 


04781 


61F8 **** 




JUMP 


Z,$l 




04791 


E07F CICC 




A2NR 


0,IPC 


; make it even 


047AH 


EA03 D96C 


$1 


MOVE 


SORR, IPC,OEY,AR,RDS 


; lower index 


047BI 


EOIF 8610 




SUBR 


TODRA,TOS,DRS 


; (TOS - lower) => case index 


047CI 


EA83 C3CC 




A2NR 


1,IPC,0EY,AR,RDS,NSE 


; upper index 


047D1 


EOFF C3CC 




A2NR 


1,IPC,NSE 


; inc to branch out 


047EI 


61F8 **** 




JUMP 


N,$2 




047FI 


EOIF 9650 




SUBS 


T0DRY,T0S,DRS 


; (upper - TOS) 


04801 


61F8 **** 




JUMP 


N,$2 




04811 


E07F ECOl 






SHA,NRA 


; make byte index 


04821 


EOOF C3CC 






1, IPC, OEY 


; inc to table base 


04831 


EA03 988C 




ADD 


TORAR, IPC, OEY, AR, RDS 


; index into case table & read 


0484 1 


EOIF 9E0C 




SUBR 


TODRR,IPC,DRS 


; self-relative 


0485 


EE02 D96C 


$2 


MOVE 


SORR, IPC,OEY,PC 


; update PC & fetch 


0486 


E206 F900 




MOVE 


SOZ , NRY, OEY, PCPG, RPS 




0487 






STACKR 






0487 


EA83 D96B 


# 


MOVE 


SORR, SP , OEY, AR, NSE , RDS 




0488 


EOFF C3CB 


# 


A2NR 


1,SP,NSE 




0489 


EOIF D8D0 




MOVE 


SODR, TOS, DRS 


; new TOS 


048A 






IFETCH 






048A 


244F F921 


# 


MOVE 


SOZE, NRA, IR, CRPS, JMAP 





A-3 



0940 


1 E07F D89A 




MOVE 


SOAR, RIO 


; pre-load with base address 


09411 


; 








0941 


1 E07F D81A 


MAJ 


MOVE 


SORA,R10 




0942 


1 E07F D899 




MOVE 


S0AR,R9 


; new A pointer 


0943 


1 E07F D813 




MOVE 


SORA, R3 




0944 


1 E07F D886 




MOVE 


SOAR, D 6 


; minor loop count 


0945 


1 E07F 989A 




ADD 


TORAR,R10 


; new B pointer 


0946 


1 E07F D814 




MOVE 


SORA, R4 


; bigstep 


0947 


1 EA03 909B 




ADD 


T0RAY,R11,0EY,AR,RDS 


; index into shuffl 


0948 


1 EOIF E599 




RTDA 


2,DRS 


• shuffl * 4 


0949 


1 EA03 8092 




ADD 


TORAA, R2, OEY, AR, RDS 


• w.r 


094A 


EOIF D8C4 




MOVE 


S0DR,D4,DRS 




094B 


EA03 E384 




A2NA 


1,0EY,AR,RDS 




094C 


EOIF D8C5 




MOVE 


S0DR,D5,DRS 


• w.i 


094D 




; 








094D 


EA03 D81A 


BFLY 


MOVE 


SORA, RIO, OEY,AR, RDS 


fetch b.r 


094E 


EOOA D844 




MOVE 


S0RY,D4,0EY,ENX 


w.r => X 


094F 


E019 D8C2 




MOVE 


S0DR,D2,DRS,ENY 


b.r => y 


0950 


EA03 C3DA 




A2NR 


1, RIO, OEY, AR, RDS 


fetch b.i 


0951 


E05F F8C1 




MOVE 


S0D,NRA,0EPM 


b.r * w.r 


0952 


EOOA D845 




MOVE 


S0RY,D5,0EY,ENX 


w.i => X 


0953 


E019 D8C1 




MOVE 


S0DR,D1,DRS,ENY 


b.i => y 


0954 


EA03 D859 




MOVE 


S0RY,R9,0EY,AR,RDS 


fetch a.r 


0955 


E05F C280 




ADD 


TODAR,D0,OEPM 


b.r*w.r + b.i*w.i => s.r 


0956 












0956 


E009 D842 




MOVE 


S0RY,D2,0EY,ENY 


b.r => y 


0957 


EOIF D8C2 




MOVE 


S0DR,D2,DRS 


a.r => D2 


0958 


E05F F8C1 




MOVE 


S0D,NRA,0EPM 


b.r*w.i 


0959 


EOOA D844 




MOVE 


S0RY,D4,0EY,ENX 


w.r => X 


095A 


E009 D841 




MOVE 


S0RY,D1,0EY,ENY 


b.i => y 


095B 


EA03 C3D9 




A2NR 


1,R9,0EY,AR,RDS 


a.i 


095C 


E05F C201 




SUBR 


T0DAR,D1,0EPM 


b.i*w.r - b.r*w.i => s.i 


095D 












095D 


I E07F D802 




MOVE 


SORA, D2 


a.r 


095E 


EE03 C3FA 




S2NR 


1,R10,OEY,AR 




095F 


ECOl 9000 




SUBR 


TORAY, DO , OEY, DR, WDS 


a.r - s.r => b.r 


0960 






RCHEK 






0960 


E02F D8C3 


# 


MOVE 


S0DR,D3,YREG 




0961 


61F8 **** 


# 


JUMP 


P,$l 




0962 


E07F DEC3 


# 


NEC 


S0DR,D3 




0963 


EOOF DDE3 


#$1 


S2NR 


14,D3,OEY 




0964 


61F8 **** 


# 


JUMP 


NC,$2 




0965 


E07F 774A 


# 


SETST 


F3 




0966 




#$2 








0966 


EE03 C3F9 




S2NR 


1,R9,0EY,AR 




0967 


ECOl 9080 




ADD 


TORAY, DO, OEY, DR, WDS , 


a.r + s.r => a.r 


0968 






RCHEK 






0968 


E02F D8C3 


# 


MOVE 


S0DR,D3,YREG 




0969 


61F8 **** 


# 


JUMP 


P,$l 




096A 


E07F DEC3 


# 


NEC 


SODR, D3 




096B 


EOOF DDE3 


#$1 


S2NR 


14,D3,OEY 




096C 


61F8 **** 


# 


JUMP 


NC,$2 




096D 


E07F 774A 


# 


SETST 


F3 




096E 




#$2 








096E 


EOIF F8C1 




MOVE 


SOD,NRA,DRS 


a.i 


096F 


EE03 C3DA 




A2NR 


1,R10,OEY,AR 




0970 


ECOl 9001 




SUBR 


TORAY, Dl, OEY, DR, WDS ; 


a.i - s.i => b.i 


0971 






RCHEK 






09711 


E02F D8C3 


# 


MOVE 


S0DR,D3,YREG 




09721 


61F8 **** 


# 


JUMP 


P,$l 




09731 


E07F DEC3 


# 


NEC 


S0DR,D3 




0974 1 


EOOF DDE3 


#$1 


S2NR 


14,D3,OEY 




0975 1 


61F8 **** 


# 


JUMP 


NC,$2 
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09761 E07F 774A 


# 


SETST 


F3 








09771 






#$2 












0977 1 E07F C3DA 




A2NR 


1,R10 




'■ 


for auto-inc 


0978 1 


















0978 1 EE03 C3D9 




A2NR 


1,R9,0EY, 


AR 






09791 ECOl 9081 




ADD 


T0RAY,D1, 


OEY,DR 


WDS ; 


a.i + s.i => a.i 


097AI 








RCHEK 










097AI 


E:02F D8C3 


# 


MOVE 


S0DR,D3,YREG 






097BI 


61F8 


k*** 


# 


JUMP 


P,$l 








097CI 


E07F DEC3 


# 


NEG 


S0DR,D3 








097DI 


EOOF 


DDE3 


#$1 


S2NR 


14,D3,OEY 








097EI 


61F8 


**** 


# 


JUMP 


NC,$2 








097FI 


E07F 


774A 


# 


SETST 


F3 








09801 






#$2 












0980 1 


E07F 


C3D9 




A2NR 


1,R9 




'■ 


for auto-inc 


09811 


















09811 


E07F 


C5E6 




S2NR 


2,06 








09821 


61F8 


96B2 




JUMP 


NZ,BFLY 






loop in bfly 


09831 
09831 


E07F 


C5D4 




A2NR 


2,R4 






for indxng shufl array (bigstep) 


0984 1 


















0984 1 


E07F 


C1E7 




S2NR 


0,D7 








09851 


61F8 


96BE 




JUMP 


NZ,MAJ 






major loop 


09861 


















0986 1 


E07F 


CC16 




SHUPZ 


SHRR,R6 






big limit 


0987 1 


E07F 


CC93 




SHDNZ 


SHRR,R3 






offset 


0988 1 


















09881 


E07F 


C1F7 




S2NR 


0,R7 








09891 


61F8 


96C5 




JUMP 


NZ,FFTL 






passes 


098AI 


E07F 


D815 




MOVE 


SORA, R5 








098B1 


A07F 


D890 




MOVE 


SOAR, TO S 


,CRTS 




• return scale count in TOS 


098CI 






; 












098C1 


EA03 


D811 


RNGCHEK 


MOVE 


S0RA,R1, 


3EY,AR, 


RDS 




098DI 


E07F 


D89C 




MOVE 


S0AR,R12 






; data pointer 


098EI 


E07F 


D818 




MOVE 


S0RA,R8 








098F1 


E07F 


D886 




MOVE 


SOAR, D 6 








09901 


E07F 


D887 




MOVE 


SOAR, D 7 






; # of points 


09911 


E07F 


D8E0 E07F 3000 


MOVE 


SOIR,D0, 


3000H 




; data limit 


09931 






'• 












09931 


EOIF 


F8C1 


$1 


MOVE 


SOD,NRA, 


DRS 




; get data 


0994 1 


EA83 


C3DC 




A2NR 


1,R12,0EY,AR,RDS,NSE 




09951 


61F8 


*** * 




JUMP 


P,$2 








09961 


E07F 


FE81 




NEG 


SOA,NRA 








0997 1 


EOOF 


9000 


$2 


SUBR 


TORAY,D0 


,OEY 




; do the compare 


0998 1 


61F8 


** * * 




JUMP 


C,$3 








09991 


E07F 


C1E6 




S2NR 


0,D6 








099AI 


61F8 


966C 




JUMP 


NZ,$1 








099B1 


A1F8 


7FFF 




CRET 


UNC 








099C1 






; 












099C1 


E07F 


7543 


$3 


RSTST 


ONCZ 






; clear status bits 


099D1 


E07F 


C1D5 




A2NR 


0,R5 






; inc scale count 


099E1 


E07F 


D811 




MOVE 


S0RA,R1 








099F1 


E07F 


D89C 




MOVE 


S0AR,R12 




; init data pointer 


09A0I 


E07F 


C3FC 




S2NR 


1,R12 








09A11 


















09A1 


EA03 


C3DC 


$4 


A2NR 


1,R12,0EY,AR,RDS 


; get data 


09A2 


EOIF 


D8C0 




MOVE 


SODR,D0 


DRS 






09 A3 


ECOl 


CDOO 




SHDNOV SHRR,D0 


OEY,DR 


,WDS 


; write back shifted data 


09A4 


E07F 


C1E7 




S2NR 


0,D7 








09A5 


61F8 


965E 




JUMP 


NZ,$4 








09A6 


A1F8 


7FFF 




CRET 


UNC 








09A7 






; 












09A7 


EOOF 


D811 


SCALE 


MOVE 


S0RA,R1 


,OEY 
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09A8I 


E07F 


D89C 


09A9I 


E07F 


C3FC 


09AA! 


E07F 


D818 


09ABI 


E07F 


D887 


09ACI 


E07F 


C1D5 


09ADI 


E07F 


F904 


09AEI 


E07F 


7543 


09AFI 






09AFI 


EA03 


C3DC 


09B0I 


EOIF 


D8C0 


09B1I 


ECOl 


CDOO 


09B2I 


E07F 


C1E7 


09B3I 


61F8 


9650 


09B4I 


A1F8 


7FFF 


09B5I 







MOVE 


S0AR,R12 


S2NR 


1,R12 


MOVE 


S0RA,R8 


MOVE 


SOAR, D 7 


A2NR 


0,R5 


MOVE 


SOZ,NRS 


RSTST 


ONCZ 



$4 



A2NR 1,R12,0EY,AR,RDS 

MOVE SODR,D0,DRS 

SHDNOV SHRR,D0,OEY,DR,WDS 

S2NR 0,D7 

JUMP NZ,$4 

CRET UNO 



data pointer 

inc scale count 
clear status bits 
get data 
write back shifted data 
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The Am29XXX Family from Advanced Micro Devices 
Am29116 Processor 



This circuit is a micro-programmable 16-bit processor. 
In addition to its complete arithmetic and logic 
functions, it contains functions that are particularly 
useful in controller-applications; Bit Set, Bit Reset, Bit 
Test, Rotate and Merge, Rotate and Compare, and 
Cyciic-Redundancy-Check (CRC) Generation. The 
divice consists of the following functional blocks 
(Figure B-1): 

1 ) The 32-word by 1 6-bit RAM is a single-port RAM 
with a latch at its output. With the use of an 
external multiplexer, it is possible to select 
separate read and write addresses for the same 
instruction. 

2) The accumulator is an edge-triggered register. 

3) The data latch is able to hold data when DLE is 
Low. 



4) The barrel shifter rotates data up to 1 5 positions. 

5) The ALU has full carry lookahead across all 1 6 bits 
in the arthmetic mode. It has the ability to execute 
all conventional one- and two- operand operations. 
In addition, it can also execute three-operand 
instructions such as Rotate and Merge, and Rotate 
and Compare with masks. It provides 3 status 
outputs, C (carry), N (negative) and OVR (overflow). 

6) The priority encoder produces a binary-weighted 
code to indicate the location of the highest order 
ONE in the data. 

7) The status register holds 8 status-bits. 

Flags Flag2 Flagi Link OVR N C Z 




C^¥- 



O- 

3!f 



o— y 

>Et 



7 



r:^ 



-<3SEi 








Figure B1. Am29116 Block Diagram 
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8) The Condition-Code Generator/Multiplexer con- 
tains the logic necessary to develop the 1 2 condi- 
tion-code test signals. 

9) The 1 6-bit instruction latch is normally transparent 



to allow decoding of the instruction inputs by the 
decoder. All Instruction, except immediate 
instruction, are executed in a single clock cycle. 
Immediate instruction requires 2 clock cycles for 
execution. 



Am29517 Multiplier 



This circuit perfomns the parallel multiplication of two 1 6- 
bit number, X and Y (see Figure B2). The product P is 
generated in the fonn of two 1 6-bit words that can be 
read out one after the other, on bus P or both together; 
the more significant bits on bus P, the less significant 
bits on bus Y. Control signals allow the part: 

1 ) To accept the numbers X and Y, after an enable bit 
(ENX, WNY). The data is then stored in an input 
registersimultaneously with aflag XM, YM, 
specifying whetherthe numbers are unsigned or in 
two's complement. 

2) To define the output format as 32 or 31 bits. The 31 - 
bit configuration is used if the data are two's 
complement fractions. 

3) To use a transparent or pipelined output staicture 
(FT). For a pipeline structure, an enable bit Is 
necessary (EN PD). This configuration is the fastest, 
with a 65 ns maximum cycle time. 

4) To switch some buses (OEP, OEL) to high 
impedence. 

5) To round the 1 6 most significant bits when the 1 6 



less significant bits are not used (RND). 
Figure B2 shows the internal diagram of this circuit. 



CLK~r 
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Figure B2. Am29517 Block Diagram 



Diagnostics-WCS Pipeline Register 



This circuit is an 8-bit pipeline register with an on-board 
shadow register. 

1) The pipeline register can load parallel data to or from 
the shadow register; input data from the D-port, and 
output data to the Y-port. 

2) The shadow register can toad parallel data to or from 
the pipeline register and can output data through 
the D input port (as in WCS toading). It can also 
input serial data from the SDI input and output serial 
data through the SDO output. 

Figure B4 shows the internal diagram of this circuit. 




Figure 84 Pipeline Register 
Block Diagram 



Yj-Y„ 
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Am2910A Microprogram Sequencer 



This is a 1 2-bit address sequencer intended for 
controlling the execution sequence of micro- 
instructions stored in the microprogram memory, it con- 
sists of the following 5 functional blocks (Figure B3). 

1 ) The four input multiplexer selects one of the 
following four sources: 

(iPC Microprogram Ck>unter 

D Direct Input 

R Register/Counter 

F Stack 

2) The microprog ram cou nter is composed of an 
incrementer followed by a register. 

3) The intemal loop counter is a pre-setable down- 
counter for repeating instructions and continuing 
loop itrations. 

linkage when executing micro-subroutines or 
loops. 

5) The built-in decoder enables one of the following 
three direct input sources: 

PL Pipeline Register 
MAP MAP PROM 
VECT Interrupt Vector 



REGISTER/ ^ 
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Figure B3 Am2910A Microprogram Sequencer 



Am2964 Dynamic Memory Controller 



This circuit provides address-multiplexing, refresh 
address-generation, and RAS/CAS control for the 
dynamic RAM memories. It can address up to 256 K 
and provide both 1 28 and 256 line refresh capability. 

1 ) Two 8-bit address latches and an 8-bit refresh 
address generator feed into a multiplexer for 
output to the dynamic RAM address lines. 

2) TheRASdecoderallows2upperaddressesto 
select one-of-four banks of RAMs. 

Figure B5 shows the internal diagram of this circuit. 
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Figure B5 Am2964 Block Diagram 
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ADVANCED MICRO DEVICES 
DOMESTIC SALES OFFICES 



ALABAMA (205) 

ARIZONA, 

Tempe (602) 

Tucson (602) 

CALIFORNIA, 

ElSegundo (213) 

Newport Beach (714) 

San Diego (619) 

Sunnyvale (408) 

Woodland Hills (818) 

COLORADO (303) 

CONNECTICUT, 

Southbury (203) 

FLORIDA, 

Altamonte Springs (305) 

Clearwater (813) 

Ft Lauderdale (305) 

Melbourne (305) 

GEORGIA (404) 

ILLINOIS (312) 

INDIANA (317) 

KANSAS (913) 

MARYLAND (301) 



882-9122 MASSACHUSETTS (617) 

MINNESOTA (612) 

242-4400 NEW JERSEY (201) 

792-1200 NEW YORK, 

Liverpool (315) 

640-321 Poughkeepsie (914) 

752-6262 Woodbury (516) 

560-7030 NORTH CAROLINA, 

720-881 1 Charlotte (704) 

992-4155 Raleigh (919) 

691-5100 OREGON (503) 

OHIO, 

264-7800 Columbus (614) 

PENNSYLVANIA, 

339-5022 Allentown (215) 

530-9971 Willow Grove (215) 

484-8600 TEXAS, 

254-2915 Austin (512) 

449-7920 Dallas (214) 

773-4422 Houston (713) 

244-7207 WASHINGTON (206) 

451-3115 WISCONSIN (414) 

796-9310 



273-3970 
938-0001 
299-0002 

457-5400 
471-8180 
364-8020 

525-1875 
847-8471 
245-0080 

891-6455 

398-8006 
657-3101 

346-7830 
934-9099 
785-9001 
455-3600 
782-7748 



INTERNATIONAL SALES OFFICES 



BELGIUM, 

Bruxelles TEL 

FAX 
TLX 
CANADA, Ontario, 

Kanata TEL 

Willowdale TEL 

FAX 
FRANCE. 

Paris TEL 

FAX 
TLX 
GERMANY 

Hannover area TEL: 

FAX 
TLX 

Mijnchen TEL 

FAX 

TLX 

Stuttgart TEL 



(02) 771 99 93 

762-3716 

61028 



(613) 592-0090 
(416)224-5193 
(416) 224-0056 



(01 



687.36.66 
. 6862185 
. . . 20253 



(05143) 50 55 

5553 

925287 

(089) 41 14-0 

406490 

523883 



HONG KONG, 

Kowloon TEL; 3-695377 

FAX: 1234276 

TLX: 50426 

ITALY Milano TEL: (02) 3390541 

FAX: 3498000 

TLX: 315286 

JAPAN, Tokyo TEL: (03) 345-8241 

FAX: 3425196 

TLX: J24064 AMDTKOJ 

LATIN AMERICA, 

Ft. Lauderdale TEL: (305) 484-8600 

FAX: (305) 485-9736 

SWEDEN, Stockholm TEL: (08) 733 03 50 

FAX: 7332285 

TLX: 11602 

UNITED KINGDOM. 

Manchester area TEL: (0925) 828008 

FAX: 827693 



(0711) 62 33 77 

FAX: 625187 

TLX: 721882 



London area 



TLX: 628524 

TEL: (04862) 22121 

FAX: 22179 

TLX: 859103 



NORTH AMERICAN REPRESENTATIVES 



CALIFORNIA 

|2 INC OEM (408) 988-3400 

DISTI (408) 498-6868 
CONNECTICUT 

SCIENTIFIC COMPONENTS (203) 272-2963 

IDAHO 

INTERMOUNTAIN TECH MKGT (208) 322-5022 

INDIANA 

SAI MARKETING CORP (317) 241-9276 

IOWA 

LORENZ SALES (319) 377-4666 

MICHIGAN 

SAI MARKETING CORP (313) 227-1786 

NEBRASKA 

LORENZ SALES (402) 475-4660 



NEW JERSEY 

TAI CORPORATION (609) 933-2600 

NEW MEXICO 

THORSON DESERT STATES (505) 293-8555 

NEW YORK 

NYCOM. INC (315) 437-8343 

OHIO 
Dayton 

DOLFUSS ROOT & CO (513) 433-6776 

Strongsville 

DOLFUSS ROOT & CO (216) 238-0300 

PENNSYLVANIA 

DOLFUSS ROOT & CO (412) 221-4420 

UTAH 

R2 MARKETING (801) 595-0631 



Advanced Micro Devices reserves the right to make changes in its product without notice in order to improve design or performance 
characteristics. The performance characteristics listed in this document are guaranteed by specific tests, guard banding, design and 
other practices common to the industry. For specific testing details, contact your local AMD sales representative. The company 
assumes no responsibility for the use of any circuits described herein. 
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